欢迎访问《北京师范大学学报》(社会科学版),今天是

北京师范大学学报(社会科学版) ›› 2023, Vol. 0 ›› Issue (5): 127-141.

• 数字人文研究 • 上一篇    下一篇

书同文字与再造书契——论古籍数字化时代的字符统一与文本规范

李飞跃   

  1. 清华大学 人文学院,北京 100084
  • 出版日期:2023-09-25 发布日期:2023-10-23
  • 作者简介:李飞跃,文学博士,清华大学人文学院副教授。
  • 基金资助:
    国家社会科学基金重大项目“基于大数据技术的古代文学经典文本分析与研究”(18ZDA238);北京市社会科学基金青年学术带头人项目“计算文学刍论”(21DTR034)。

The Unification of Script and Writing and the Recreation of Books and Documents: The Character Unity and Text Standard in the Digital Age of Ancient Books

LI Feiyue   

  1. School of Humanities, Tsinghua University, Beijing 100084, China
  • Online:2023-09-25 Published:2023-10-23

摘要: 随着古籍的电子化与数据库应用,曾经停废的大量汉字被激活。字体字形多样、字际关系复杂和编码系统不一,严重阻碍了古籍文本的编辑、保存、呈现、转换、检索及深度利用。文本的电子化、规范化及标准化是古籍数字化的起点,也是数字设施建设和数字人文研究的基础。近代以来新旧字体、正俗字形与字符编码的三次系统性变更,决定了字符集与文本库建设只能以发布的各种国家标准为基础。纵观历史,汉字一直处在不断统一规范的进程中,汉文典籍的一致性让中华文明具有突出的统一性。创建统一字符集和标准文本库将是继秦朝“书同文字”之后的全新规范,也是汉字系统继从刻画到书写,又到数码形态的再次重置。“再造书契”有利于实现古籍数据的统一刻画、深度标引、交互整合和多功能开发,促进古籍文本结构化、知识体系化、平台智能化,推动古籍整理利用的转型升级。

关键词: 古籍数字化, 字符集, 文本库, 书同文

Abstract: With the rapid development of the digitization of ancient books,a large number of previously discontinued Chinese characters have been activated.The diversity of font styles,complex character relationships,and inconsistent encoding systems severely hinder the editing,preservation,presentation,conversion,retrieval,and in-depth utilization of ancient texts.The digitization,standardization,and normalization of texts are the starting points for the digitization of ancient books and the foundation for digital infrastructure construction and digital humanities research.Since modern times,three systematic changes in new and old font styles,formal and informal character forms,and character encoding have determined the fact that the construction of character sets and text databases can only be based on various national standards that have been issued.Chinese characters have been in a continuous process of unification and standardization,and the unification of script and writing is the mainstream trend of history.The creation of a unified character set and a standard text database is a new specification after “the unification of script and writing” since the Qin Dynasty.It is also a renewed resetting of the Chinese character system from engraving to handwriting,and then to the digital form.“The recreation of books and documents” facilitates the unified depiction,in-depth indexing,interactive integration,and multifunctional development of ancient book data,promotes the structural and knowledge systematization,platform intelligence,and drives the transformation and upgrading of the management and utilization of ancient books.

Key words: the digitization of ancient books, character sets, text databases, the unification of script and writing

中图分类号: