Documentation, Informaiton & Knowledge ›› 2024, Vol. 41 ›› Issue (2): 28-38,149.doi: 10.13366/j.dik.2024.02.028

• Academic Focus: Artificial Intelligence Generated Content (AIGC)Governance • Previous Articles     Next Articles

Identification Methods of Artificial Intelligence Generated Content in Online Communities

DENG Shengli, WANG Fan, WANG Haowei   

  1. School of Information Management, Wuhan University, Wuhan, 430072
  • Online:2024-03-10 Published:2024-05-14
  • Contact: Correspondence should be addressed to WANG Fan, E-mail:1161252028@qq.com, ORCID:0000-0003-0100-0320
  • Supported by:
    This is an outcome of the project "Research on Evaluation and Prediction of User Contribution Behavior in Online Knowledge Community from the Perspective of Information Ecological Chain"(71974149)supported by National Natural Science Foundation of China and the Major Project "Research on Reconstruction and Application of Information Service System Driven by Humanistic Artificial Intelligence"(22&ZD324)supported by National Social Science Foundation of China.

Abstract: [Purpose/significance] Generative artificial intelligence will cause a certain degree of AI information pollution in online communities. The various AIGC identification methods studied in this paper are of great significance to prevent the negative impact of rapidly evolving generated artificial intelligence. [Design/Methodology] This paper first constructed a HAC data set in multiple online community platforms with 54 major categories of topics of Sina Weibo, which contained 100,873 pieces of information written by humans and generated artificial intelligence respectively. Then it explored whether the current 6 kinds of mainstream deep learning and 7 kinds of machine learning methods can identify whether the information in the online community was written by human beings or generated by artificial intelligence. Finally, the BEM-RCNN method was proposed to further improve the recognition of AIGC precision. [Findings/Conclusion] From the perspective of constructed data set, it is found that generated artificial intelligence has a strong "human-like expression", which can simulate human beings to post and reply on social media platforms. The experimental results show that the method proposed in this paper has an accuracy of 96.4%, which can well identify whether the content on the online community is written by humans or AI. It is superior to the 13 other mainstream methods such as BERT, ERNIE, and TextRNN in terms of precision, recall rate, F1-value, and accuracy, verifying its performance advantages. At the same time, many exploratory experiments have also proved that although the current mainstream machine learning methods are less accurate than the method in this paper, they can also be competent for some AIGC recognition tasks. [Originality/Value] Multiple methods are used in this paper to identify AIGC on social media, and prevent information pollution caused by generative artificial intelligence on social media platforms.

Key words: Generative artificial intelligence, Artificial Intelligence Generated Content(AIGC), Online communities, Machine learning, AI information pollution