图书情报知识 ›› 2022, Vol. 39 ›› Issue (1): 51-60.doi: 10.13366/j.dik.2022.01.051

• 图书、文献与交流 • 上一篇    下一篇

时间维度下的史籍全文自动重组研究——数字人文视角下的探索

张琪,王东波,黄水清,李斌,孟凯,邓三鸿   

  • 出版日期:2022-01-10 发布日期:2022-03-19

Automatic Reorganization of Historical Records from Time Dimension: From the Perspective of Digital Humanities

  • Online:2022-01-10 Published:2022-03-19

摘要: [目的/意义]本文从数字人文的视角出发,试图探究一套具体的技术方法解决古汉语时间描述所存在的省略、共指、歧义、模糊等问题,使得读者可以跨越纪传体、国别体、纪事本末体等体裁壁垒,获取不同史书中关于某一时间段的所有史料。[研究设计/过程]在梳理古汉语时间描述类型与特征的基础上,提出一套以时间为线索自动重组史书全文的方法。该方法首先识别古汉语时间描述并进行语义解析,继而识别事件句并将事件句关联至时间描述,最后将提出的方法应用于纪传体史书《史记》和国别体史书《国语》的重组中,检验方法的有效性。[结论/发现]本研究所提出的方法能够有效实现纪传体、国别体史书以时间为线索的重组问题,在有效减少人工标注的前提下达到了较高的准确率。[创新/价值]针对古汉语时间描述存在的歧义与共指等问题,提出一套完整的以时间为线索自动重组史书全文的方法,并通过实验验证了方法的有效性。


关键词: 数字人文, 史书, 古汉语时间信息处理, 古汉语时间表达式消歧, 事件时间语义关联

Abstract:

[Purpose/Significance] Under the perspective of digital humanities, we aimed to propose a solution to deal with the problems of ancient Chinese time expressions — ellipsis, coreference, ambiguity, and vagueness. Our research will make readers obtain the historical records of a period from kinds of history books written in biographical style, national style, and chronicle style. [Design/Methodology] Based on summarizing the types and features of ancient Chinese temporal expressions, this paper puts forward a fulltext time-line reorganization system for historical books. The method firstly recognizes the ancient Chinese time description and performs semantic analysis, then identifies the event sentences and associates them with time description. Finally, the proposed method is applied to the recombination of The Records of the Grand Historian(Shi Ji), a biographical historical book, and Discourses of the States (Guo Yu), a national historical book, to test the validity of the method. [Findings/Conclusion] The method proposed in this article could realize the reorganization of historical books by time which were written in biographical style and national style. It could also achieve high accuracy while reducing manual labeling. [Originality/Value] Orienting to the characteristics of ambiguity and common reference in the ancient Chinese time expressions, this paper proposes a system to automatically reorganize the contents of historical books based upon time, and verifies the validity of the system through experiments.


Key words: Digital humanities, Historical records, Ancient Chinese temporal expression processing, Ancient Chinese time expression disambiguation, Event-time relation extraction