图书情报知识 ›› 2017, Vol. 0 ›› Issue (1): 89-97.doi: 10.13366/j.dik.2017.01.089

• 情报、信息与共享 • 上一篇    下一篇

基于节点内容及拓扑结构的引文网络社团划分

肖雪,陈云伟,邓勇   

  • 出版日期:2017-01-10 发布日期:2017-01-10

Community Detection Algorithm Based on Content and Topological Structure

  • Online:2017-01-10 Published:2017-01-10

摘要:

引文网络的社团划分是文本挖掘的一种重要方法,为提高引文网络社团划分的准确性,本文提出一种综合考虑引文网络的内容和拓扑结构属性的社团划分方法。该算法首先利用改进的余弦相似度方法计算节点相似度,然后综合考虑节点结构与内容相似度对网络进行重构,在此基础上以相邻节点对相似度作为边权,运用Louvain社团划分方法对加权引文网络进行社团划分,提出一种综合考虑节点内容及结构属性的社团划分方法。通过在真实引文网络数据集上的实验表明,本文所提出的方法能改善引文网络的划分效果。

关键词: 引文网络, 社团划分, 聚类, 文本挖掘, 文本相似度

Abstract:

The study of community discovery has great value to text mining. In order to improve the accuracy of the communities of Citation networks, this paper describes a new community discovering algorithm for literature based on content and topological structure.  First of all, this paper establish the vector space model to calculate the similarity of the adjacent papers, and then refactor the citation network based on the similarity and linking relationship of vertexes. On this basis, we set the similarity of adjacent papers as the weight of the link.The community discovery algorithm is based on the “Louvain community detecting algorithm”. Experiments show that the proposed algorithm is an effective solution to improve the performance of community detection.

Key words: Citation networks, Community discovery, Clustering, Data mining, Text similarity