图书情报知识 ›› 2019, Vol. 0 ›› Issue (3): 101-112.doi: 10.13366/j.dik.2019.03.101

• 情报、信息与共享 • 上一篇    下一篇

基于DataCite的科学数据现状特征研究

罗鹏程,崔海媛,赵静茹   

  • 出版日期:2019-05-10 发布日期:2019-05-10

Status and Characteristics of Scientific Data Based on DataCite

  • Online:2019-05-10 Published:2019-05-10

摘要:

[目的/意义]分析世界范围内海量科学数据特征,为科学数据的有效利用和管理提供参考。[研究设计/方法]采集DataCite中14,835,029条科学数据元数据,基于统计分析、社会网络分析、文本分析等方法,从时间、空间、主题、作者、版本、使用等维度对科学数据的现状特征进行分析。[结论/发现] 科学数据呈指数增长态势;理工科数据占据主体,人文社科数据异军突起;数据中心两极分化严重;欧美国家占据开放数据优势;我国数据中心建设滞后于学者需求;不同学科作者合作差异显著;数据集版本数量遵从幂律分布;数据开放共享助力提升学者影响力。[创新/价值]从多个视角对现有海量科学数据全貌特征深入挖掘,总结优秀数据中心实践经验,探讨我国科学数据管理发展路径。

关键词: 科学数据, 现状特征, 科学数据管理, DataCite

Abstract:

[Purpose/Significance]This paper intends to analyze the characteristics of massive scientific data and provide a reference for effective utilization and efficient management of scientific data. [Design/Methodology]14,835,029 pieces of metadata for scientific data were collected from the DataCite. By using statistical analysis, social network analysis and text analysis,the status and characteristics of the collected scientific data were explored from six dimensions, including time, space, topic, author, version, and utilization. [Findings/Conclusion]It has been found that scientific data increases exponentially.And data of science and engineering accounts for the majority, while data of humanity and social science occupies a relatively small part. There is a serious polarization among scientific data centers. European and American countries possess advantages in the field of open data. The development of data centers in China can't meet the scholars' needs. Authors' collaborations vary a lot in different disciplines. The number of dataset versions follows the power law distribution. Data opening and sharing can help improve scholars' impacts. [Originality/Value]This study explores the characteristics of massive scientific data comprehensively and deeply from several perspectives, summarizes the practical experience of excellent scientific data centers, and explores the development approach of scientific data management in China.