在线社区中人工智能生成内容的识别方法研究

doi:10.13366/j.dik.2024.02.028

图书情报知识 ›› 2024, Vol. 41 ›› Issue (2): 28-38,149.doi: 10.13366/j.dik.2024.02.028

• 学术聚焦 · 人工智能生成内容（AIGC）治理 • 上一篇下一篇

在线社区中人工智能生成内容的识别方法研究

邓胜利，汪璠，王浩伟

武汉大学信息管理学院，武汉，430072

出版日期:2024-03-10 发布日期:2024-05-14
通讯作者: 汪璠（ORCID：0000-0003-0100-0320），博士研究生，研究方向：生成式人工智能；机器学习，E-mail:1161252028@qq.com
作者简介:邓胜利（ORCID：0000-0001-7489-4439），博士，教授，研究方向：用户信息行为、人本人工智能，Email:victorydc@sina.com；王浩伟（ORCID：0000-0002-5085-6574），硕士研究生，研究方向：生成式人工智能，E-mail:2018301040137@whu.edu.cn。
基金资助:
本文系国家自然科学基金项目“信息生态链视角下在线知识社区用户贡献行为评价及预测研究”（71974149）和国家社会科学基金重大项目“人本人工智能驱动的信息服务体系重构与应用研究”（22&ZD324）研究成果之一。

Identification Methods of Artificial Intelligence Generated Content in Online Communities

DENG Shengli， WANG Fan， WANG Haowei

School of Information Management, Wuhan University, Wuhan, 430072

Online:2024-03-10 Published:2024-05-14
Contact: Correspondence should be addressed to WANG Fan, E-mail:1161252028@qq.com, ORCID:0000-0003-0100-0320
Supported by:
This is an outcome of the project "Research on Evaluation and Prediction of User Contribution Behavior in Online Knowledge Community from the Perspective of Information Ecological Chain"（71974149）supported by National Natural Science Foundation of China and the Major Project "Research on Reconstruction and Application of Information Service System Driven by Humanistic Artificial Intelligence"（22&ZD324）supported by National Social Science Foundation of China.

摘要/Abstract

摘要： [目的/意义]生成式人工智能会对在线社区造成一定程度的AI信息污染，研究多种AIGC识别方法对防范快速进化的生成式人工智能带来的负面影响有重要意义。[研究设计/方法]首先在以新浪微博54个大类主题为主的多个在线社区平台中构建了HAC数据集，其中包含100,873条分别由人类和生成式人工智能撰写的信息；然后探究当前6个主流深度学习和7个机器学习方法是否能识别在线社区中的信息是由人类还是由生成式人工智能所撰写；最后提出了一种BEM-RCNN方法进一步提高AIGC的识别精度。[结论/发现]从构建的数据集中可以看出，生成式人工智具有强大的“类人表达”，能够模拟人类在社交媒体平台上发布和回复内容。实验结果表明，提出的方法准确度达到96.4%，能够很好地识别在线社区上的内容是由人类还是AI撰写。在精度、召回率、F1-值和准确度上均优于BERT、ERNIE、TextRNN等其他13种主流的方法，验证了其性能优势。同时，大量探究实验也证明了当前主流的机器学习方法虽然精度低于此方法，但是也能够识别部分AIGC。[创新/价值]使用多种方法去识别社交媒体上的AIGC，防范生成式人工智能对社交媒体平台造成的信息污染。

关键词: 生成式人工智能, 人工智能生成内容, 在线社区, 机器学习, AI信息污染

Abstract: [Purpose/significance] Generative artificial intelligence will cause a certain degree of AI information pollution in online communities. The various AIGC identification methods studied in this paper are of great significance to prevent the negative impact of rapidly evolving generated artificial intelligence. [Design/Methodology] This paper first constructed a HAC data set in multiple online community platforms with 54 major categories of topics of Sina Weibo, which contained 100,873 pieces of information written by humans and generated artificial intelligence respectively. Then it explored whether the current 6 kinds of mainstream deep learning and 7 kinds of machine learning methods can identify whether the information in the online community was written by human beings or generated by artificial intelligence. Finally, the BEM-RCNN method was proposed to further improve the recognition of AIGC precision. [Findings/Conclusion] From the perspective of constructed data set, it is found that generated artificial intelligence has a strong "human-like expression", which can simulate human beings to post and reply on social media platforms. The experimental results show that the method proposed in this paper has an accuracy of 96.4%, which can well identify whether the content on the online community is written by humans or AI. It is superior to the 13 other mainstream methods such as BERT, ERNIE, and TextRNN in terms of precision, recall rate, F1-value, and accuracy, verifying its performance advantages. At the same time, many exploratory experiments have also proved that although the current mainstream machine learning methods are less accurate than the method in this paper, they can also be competent for some AIGC recognition tasks. [Originality/Value] Multiple methods are used in this paper to identify AIGC on social media, and prevent information pollution caused by generative artificial intelligence on social media platforms.

Keywords: Generative artificial intelligence, Artificial Intelligence Generated Content（AIGC）, Online communities, Machine learning, AI information pollution

邓胜利, 汪璠, 王浩伟. 在线社区中人工智能生成内容的识别方法研究[J]. 图书情报知识, 2024, 41(2): 28-38,149.

DENG Shengli, WANG Fan, WANG Haowei. Identification Methods of Artificial Intelligence Generated Content in Online Communities[J]. Documentation, Informaiton & Knowledge, 2024, 41(2): 28-38,149.

[1]	王俊, 谢青伶, 刘畅. 日常生活情境下用户与生成式人工智能交互行为分析[J]. 图书情报知识, 2025, 42(2): 60-69, 93.
[2]	魏远山. 生成式人工智能训练数据的著作权法因应：确需设置合理使用规则吗？[J]. 图书情报知识, 2025, 42(1): 78-88.
[3]	戴文怡, 肖冬梅. 生成式人工智能训练数据的著作权法因应：著作权合规方案[J]. 图书情报知识, 2025, 42(1): 89-100.
[4]	吴欣雨, 吴振新. 人工智能在数字资源长期保存领域应用进展述评[J]. 图书情报知识, 2025, 42(1): 146-157.
[5]	张奎, 王秀伟. 生成式AI在传统文化传播中的媒介呈现与风险治理[J]. 图书情报知识, 2024, 41(4): 98-109.
[6]	何静, 沈阳. 基于职业替代概率模型的AIGC职业发展探究[J]. 图书情报知识, 2024, 41(4): 34-41.
[7]	张春春, 孙瑞英. 如何走出AIGC的“科林格里奇困境”：全流程动态数据合规治理[J]. 图书情报知识, 2024, 41(2): 39-49,66.
[8]	张新新, 黄如花. 生成式智能出版的应用场景、风险挑战与调治路径[J]. 图书情报知识, 2023, 40(5): 77-86,27.
[9]	张新新, 丁靖佳. 生成式智能出版的技术原理与流程革新[J]. 图书情报知识, 2023, 40(5): 68-76.
[10]	龚芙蓉. ChatGPT类生成式AI对高校图书馆数字素养教育的影响探析[J]. 图书情报知识, 2023, 40(5): 97-106,156.
[11]	王鹏涛, 徐润婕. AIGC介入知识生产下学术出版信任机制的重构研究[J]. 图书情报知识, 2023, 40(5): 87-96.
[12]	朱禹, 陈关泽, 陆泳溶, 樊伟. 生成式人工智能治理行动框架：基于AIGC事故报道文本的内容分析[J]. 图书情报知识, 2023, 40(4): 41-51.
[13]	莫祖英, 盘大清, 刘欢, 赵悦名. 信息质量视角下AIGC虚假信息问题及根源分析[J]. 图书情报知识, 2023, 40(4): 32-40.
[14]	李白杨, 白云, 詹希旎, 李纲. 人工智能生成内容（AIGC）的技术特征与形态演进[J]. 图书情报知识, 2023, 40(1): 66-74.
[15]	詹希旎, 李白杨, 孙建军. 数智融合环境下AIGC的场景化应用与发展机遇[J]. 图书情报知识, 2023, 40(1): 75-85.

在线社区中人工智能生成内容的识别方法研究

Identification Methods of Artificial Intelligence Generated Content in Online Communities

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价