图书情报知识 ›› 2014, Vol. 0 ›› Issue (4): 63-67.doi: 10.13366/j.dik.2014.04.063

• 专题研究 • 上一篇    下一篇

基于hLDA的科技文献主题摘要生成算法与实现——以电力行业论文为例

王庆红,王平   

Study on the Generation Algorithm and Implementation of the Theme Abstract in Scientific and Technical Literature Based on hLDA:Taking Papers in Power Industry as Example

摘要:

随着信息爆炸时代的到来,科技文献数量的快速增长,科技工作者对于科技文献有效信息获取的要求也越来越高。本文提出了一种科技文献主题自动摘要生成算法。利用hLDA模型对科技文献数据集进行主题建模,并通过摘要候选句的选择,综合多个因素的句子打分策略,自动为科技文献中潜在的主题生成摘要。在实验中,提出基于主题覆盖度的摘要评价方法。实验结果验证了本文提出的主题摘要生成算法的有效性。

关键词: 科技文献, 主题摘要, 生成算法, hLDA

Abstract:

With the advent of the era of information explosion, and the rapid growth of the number of science and technology literature, the requirement of obtaining effective information for science and technology literature to the science and technology workers is becoming higher. This paper proposes a scientific literature theme abstract automatic generation algorithm. We make modeling theme to the data set of scientific literature by usinghLDA model, and automatically generate the abstract for the potential themes in science and technology literature through the choice of the candidate words and scoring strategy to the sentences of integrating multiple factors. In the experiment, we put forward the evaluation method based on topic coverage. The experimental results verify the validity of the generation algorithm for the theme abstract proposing in the paper.

Key words: Scientific and technical literature, Theme abstract, Generation algorithm, hLDA