Documentation, Informaiton & Knowledge ›› 2023, Vol. 40 ›› Issue (5): 39-49.doi: 10.13366/j.dik.2023.05.039

• Special Topic: Digital Academic Infrastructure in the AI Era • Previous Articles     Next Articles

Academic Literature Data Sets Towards the AI Era: Characteristics and Development Direction

ZHANG Tongyang1, WANG Chuhan2, YU Chao1, XU Jian1   

  1. 1. School of Information Management, Sun Yat-sen University, Guangzhou,510006;
    2. Department of Information Management,Beijing University, Beijing, 100091
  • Online:2023-09-10 Published:2023-10-22
  • Contact: Correspondence should be addressed to XU Jian,,ORCID:0000-0003-4886-4708
  • Supported by:
    This is an outcome of the project "Practice and Countermeasures of Talentsas the First Resource in Guangzhou–Research on Matching the Enterprise Technical Demands and Talents in Universities"(2023GZYB04)supported by a grant from the 2023 General Program of the Philosophy and Social Sciences Development of Guangzhou during the 14th Five-Year Plan period.

Abstract: [Purpose/Significance] Driven by the application development of artificial intelligence technologies, academic literature datasets are playing an increasingly important role in the field of scientometrics. Developing from traditional data collections oriented to information supply to knowledge resources that assist in knowledge discovery and relationship network construction nowadays, every aspect of the function of data sets have been greatly improved, which in turn provides support for expanding the depth and breadth of research in scientometrics. [Design/Methodology] The article uses journal articles published by Scientometrics in 2016-2020 as the data source. Through the analysis of the data sets usage records of scientometrics research, the overall use of the data sets is summarized, and the relationship between the usage popularity of the data sets and the number of documents is explored. In terms of typical data sets that have been frequently used, the article specifically analyzes their characteristics, explores the impact of artificial intelligence technology on datasets and looks forward to its future construction direction. [Findings / Conclusion] There is a certain positive correlation between the usage frequency of datasets and their collection volume, the same study tends to cross use multiple datasets, and the relationship between scientometrics research and artificial intelligence technology is getting increasingly close. [Originality/Value] The purpose of this study is to summarize the construction and development rules of data sets in recent years by analyzing the striking features of data sets used in scientometrics related papers, so as to provide references for the selection of data sets for scientometrics research.

Key words: Data sets, Artificial intelligence, Scientometrics, Feature analysis