通往AI时代的科研文献数据集：特征规律与发展方向

doi:10.13366/j.dik.2023.05.039

图书情报知识 ›› 2023, Vol. 40 ›› Issue (5): 39-49.doi: 10.13366/j.dik.2023.05.039

• 专题·AI 时代的数字学术基础设施 • 上一篇下一篇

通往AI时代的科研文献数据集：特征规律与发展方向

张彤阳¹，王楚涵²，俞超¹，徐健¹

1.中山大学信息管理学院，广州，510006；
2.北京大学信息管理系，北京，100091

出版日期:2023-09-10 发布日期:2023-10-22
通讯作者: 徐健（ORCID：0000-0003-4886-4708），博士，教授，研究方向：科学计量、网络数据挖掘、网络用户情感分析、跨学科交流分析, Email:issxj@mail.sysu.edu.cn。
作者简介:张彤阳（ORCID：0000-0003-3538-8343），博士研究生，研究方向：科学计量、语义分析、跨学科交流分析，Email:zhangty65@mail2.sysu.edu.cn；王楚涵（ORCID：0000-0001-9880 -5613），博士研究生，研究方向: 科学计量、知识计算、科研人员评价， Email:wangchuhan@stu.pku.edu.cn；俞超（ORCID：0000-0002-3195-1357）, 博士研究生，研究方向：科学计量、数据挖掘， Email:yuch25@mail3.sysu.edu.cn。
基金资助:
本文系广州市哲学社会科学发展“十四五”规划 2023 年度一般课题项目“人才是第一资源的广州实践和对策研究—高校科技人才与企业技术需求对接问题与对策研究”（2023GZYB04）的研究成果之一。

Academic Literature Data Sets Towards the AI Era: Characteristics and Development Direction

ZHANG Tongyang¹, WANG Chuhan², YU Chao¹, XU Jian¹

1. School of Information Management, Sun Yat-sen University, Guangzhou,510006;
2. Department of Information Management，Beijing University, Beijing, 100091

Online:2023-09-10 Published:2023-10-22
Contact: Correspondence should be addressed to XU Jian, Email:issxj@mail.sysu.edu.cn,ORCID：0000-0003-4886-4708
Supported by:
This is an outcome of the project "Practice and Countermeasures of Talentsas the First Resource in Guangzhou–Research on Matching the Enterprise Technical Demands and Talents in Universities"（2023GZYB04）supported by a grant from the 2023 General Program of the Philosophy and Social Sciences Development of Guangzhou during the 14th Five-Year Plan period.

摘要/Abstract

摘要： [目的/意义]人工智能技术的更迭应用驱动着数据集在科学计量研究领域发挥着日渐重要的作用。从传统面向信息供应的数据资料集合到如今辅助知识发现和关系网络构建的知识资源，数据集各功能取得了快速发展，进而为拓展科学计量研究的深度和广度提供支持。[研究设计/方法]以Scientometrics期刊2016-2020年收录的论文为数据源，分析数据集的整体使用情况，探究数据集的使用热度与文献数量之间的关系，针对典型数据集进行特征分析，并探讨人工智能技术对于数据集工作的影响，展望数据集的未来建设方向。[结论/发现] 数据集的被使用频次与其收录论文数量之间存在一定正相关关系，同一科学计量研究倾向于同时使用多种数据集，且基于科研文献数据集的科学计量研究与人工智能技术之间的关系日益紧密。[创新/价值]旨在通过分析科学计量相关论文所使用数据集的特征，总结归纳近年来数据集的建设发展规律，并为开展科学计量研究选用数据集提供参考。

关键词: 数据集, 人工智能, 科学计量, 特征分析

Abstract: [Purpose/Significance] Driven by the application development of artificial intelligence technologies, academic literature datasets are playing an increasingly important role in the field of scientometrics. Developing from traditional data collections oriented to information supply to knowledge resources that assist in knowledge discovery and relationship network construction nowadays, every aspect of the function of data sets have been greatly improved, which in turn provides support for expanding the depth and breadth of research in scientometrics. [Design/Methodology] The article uses journal articles published by Scientometrics in 2016-2020 as the data source. Through the analysis of the data sets usage records of scientometrics research, the overall use of the data sets is summarized, and the relationship between the usage popularity of the data sets and the number of documents is explored. In terms of typical data sets that have been frequently used, the article specifically analyzes their characteristics, explores the impact of artificial intelligence technology on datasets and looks forward to its future construction direction. [Findings / Conclusion] There is a certain positive correlation between the usage frequency of datasets and their collection volume, the same study tends to cross use multiple datasets, and the relationship between scientometrics research and artificial intelligence technology is getting increasingly close. [Originality/Value] The purpose of this study is to summarize the construction and development rules of data sets in recent years by analyzing the striking features of data sets used in scientometrics related papers, so as to provide references for the selection of data sets for scientometrics research.

Keywords: Data sets, Artificial intelligence, Scientometrics, Feature analysis

张彤阳, 王楚涵, 俞超, 徐健. 通往AI时代的科研文献数据集：特征规律与发展方向[J]. 图书情报知识, 2023, 40(5): 39-49.

ZHANG Tongyang, WANG Chuhan, YU Chao, XU Jian. Academic Literature Data Sets Towards the AI Era: Characteristics and Development Direction[J]. Documentation, Informaiton & Knowledge, 2023, 40(5): 39-49.

[1]	王曦. 人工智能赋能智慧图书馆发展的作用机制[J]. 图书情报知识, 2024, 41(6): 94-101,165.
[2]	张奎, 王秀伟. 生成式AI在传统文化传播中的媒介呈现与风险治理[J]. 图书情报知识, 2024, 41(4): 98-109.
[3]	何静, 沈阳. 基于职业替代概率模型的AIGC职业发展探究[J]. 图书情报知识, 2024, 41(4): 34-41.
[4]	冯昌扬, 陈静怡, 高鹏钰, 曾江峰. 人工智能是否达到奇点——来自图情档职业被人工智能完全替代概率的数据分析与思考[J]. 图书情报知识, 2024, 41(4): 42-56,81.
[5]	周琼, 徐亚苹, 蔡迎春. 高校学生人工智能素养能力现状及影响因素多维分析[J]. 图书情报知识, 2024, 41(3): 38-48.
[6]	唐旭丽, 李信, 易明. 因果推断视角下科研合作多样性和文献新颖性间关系研究[J]. 图书情报知识, 2024, 41(3): 116-129.
[7]	张静蓓, 虞晨琳, 蔡迎春. 人工智能素养教育：全球进展与展望[J]. 图书情报知识, 2024, 41(3): 15-26.
[8]	黄如花, 石乐怡, 吴应强, 陈添. 全球视野下我国人工智能素养教育内容框架的构建[J]. 图书情报知识, 2024, 41(3): 27-37.
[9]	邓胜利, 汪璠, 王浩伟. 在线社区中人工智能生成内容的识别方法研究[J]. 图书情报知识, 2024, 41(2): 28-38,149.
[10]	张春春, 孙瑞英. 如何走出AIGC的“科林格里奇困境”：全流程动态数据合规治理[J]. 图书情报知识, 2024, 41(2): 39-49,66.
[11]	潘禹辰, 呼玮, 杨建梁, 徐璐, 卢小宾. 新文科下的信息资源管理专业人工智能课程体系设计[J]. 图书情报知识, 2023, 40(6): 42-51, 67.
[12]	谢天, 邱林, 李雨曈, 罗殷, 刘盼. 大模型时代的社会科学，何去何从？[J]. 图书情报知识, 2023, 40(6): 6-9,30.
[13]	王鹏涛, 徐润婕. AIGC介入知识生产下学术出版信任机制的重构研究[J]. 图书情报知识, 2023, 40(5): 87-96.
[14]	龚芙蓉. ChatGPT类生成式AI对高校图书馆数字素养教育的影响探析[J]. 图书情报知识, 2023, 40(5): 97-106,156.
[15]	肖鹏. 把“专题数据库”作为方法：数字人文的重新认识及其在AI时代的发展趋势[J]. 图书情报知识, 2023, 40(5): 16-27.

通往AI时代的科研文献数据集：特征规律与发展方向

Academic Literature Data Sets Towards the AI Era: Characteristics and Development Direction

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价