图书情报知识 ›› 2023, Vol. 40 ›› Issue (4): 32-40.doi: 10.13366/j.dik.2023.04.032

• 学术聚焦(1)· 人工智能生成内容(AIGC)治理 • 上一篇    下一篇

信息质量视角下AIGC虚假信息问题及根源分析

莫祖英, 盘大清, 刘欢, 赵悦名   

  1. 郑州航空工业管理学院信息管理学院,郑州,450046
  • 出版日期:2023-07-10 发布日期:2023-08-16
  • 通讯作者: 莫祖英(ORCID:0000-0003-0661-9333),博士,教授,研究方向:信息质量、虚假信息质量,Email:mozuying611@163.com.
  • 作者简介:盘大清(ORCID:0000-0003-0712-4957),硕士研究生,研究方向:虚假信息识别,Email:qcountpan@163.com;刘欢(ORCID:0009-0006-9571-2984),硕士研究生,研究方向:虚假信息治理、用户行为,Email:979243359@qq.com;赵悦名(ORCID:0009-0000-5784-208X),硕士研究生,研究方向:虚假信息治理,Email:1023200761@qq.com。
  • 基金资助:
    本文系国家社科基金项目“社交媒体情境下网络虚假信息传播行为干预研究”(21BTQ049)的研究成果之一。

Analysis on AIGC False Information Problem and Root Cause from the Perspective of Information Quality

MO Zuying PAN Daqing LIU Huan ZHAO Yueming   

  1. School of Information Management, Zhengzhou University of Aeronautics, Zhengzhou, 450046
  • Online:2023-07-10 Published:2023-08-16
  • Contact: Correspondence should be addressed to MO Zuying, Email:mozuying611@163.com, ORCID: 0000-0003-0661-9333
  • Supported by:
    This is an outcome of the project "Research on the Intervention of Online False Information Dissemination Behavior in the Context of Social Media "(21BTQ049)supported by National Social Science Foundation of China.

摘要: [目的/意义] 探讨AIGC中存在的各种虚假信息类型及其特征,对理解虚假信息产生的根源、减少AIGC中虚假信息的生成具有积极作用。[研究设计/方法] 采用数据测试实验方法,立足于信息质量视角,通过采集AIGC系统一手的测试数据和收集二手的AIGC虚假信息来剖析AIGC虚假信息类型及特征;以人工智能语言模型的信息生成过程为着力点,探析AIGC中虚假信息生成的根源。[结论/发现] AIGC虚假信息主要包括事实性虚假和幻觉性虚假两种类型,事实性虚假信息主要集中在数据错误、作者作品错误、客观事实错误、编程代码错误、机器翻译错误五个方面,而幻觉性虚假信息主要集中在虚假新闻事件、虚假学术信息、虚假健康信息和偏见与歧视方面;AIGC虚假信息产生的根源与大规模语言模型、预训练数据集和人工标注三个要素有关。[创新/价值] 采用了数据测试实验方法,并辅以二手数据的收集,全面分析了各种AIGC虚假信息的类型,并根据生成机理与表现形式将其划分为事实性虚假信息和幻觉性虚假信息,为AIGC虚假信息的进一步研究提供了理论基础。

关键词: 人工智能生成内容(AIGC), 虚假信息, 信息质量, 事实性虚假信息, 幻觉性虚假信息, 根源分析

Abstract: [Purpose/Significance] This paper aims to analyze the types and characteristics of false information in AIGC,which has a positive role in understanding the root causes of false information and reducing its generation. [Design/Methodology] In this study, the method of data testing experiment was adopted. Based on the perspective of information quality, the types and characteristics of false information generated by AIGC were analyzed through collecting first-hand testing data of AI systems and second-hand false information of AIGC. Further, focusing on the information generation process of artificial intelligence language models, we explored the origins of false information generation in AIGC. [Findings/Conclusion] False information in AIGC mainly consists of two types: factual false information and hallucinatory false information. Factual false information is primarily focused on errors in five aspects: data errors, author and his works errors, errors in objective facts, programming code errors, and machine translation errors. On the other hand, hallucinatory false information is mainly concentrated in the areas of fake news events, false academic information, false health information, and bias and discrimination. The origins of false information in AIGC are related to three factors: large-scale language models, pre-training datasets, and human annotations. [Originality/Value] This study employes a data testing experimental approach, complemented by the collection of second-hand data, comprehensively analyzes various types of false information in AIGC, and divides false information into factual false information and hallucinatory false information based on the generation mechanisms and manifestations, which provides a theoretical foundation for further research on false information in AIGC.

Key words: Artificial intelligence generated content(AIGC), False information, Information quality, Factual false information, Hallucinatory false information, Root cause analysis