Documentation, Informaiton & Knowledge ›› 2025, Vol. 42 ›› Issue (1): 89-100.doi: 10.13366/j.dik.2025.01.089

• Academic Focus (2) : Reasonable Use of Training Data for Generative Artificial Intelligence • Previous Articles     Next Articles

Legal Responses of Copyright Law to Generative Artificial Intelligence Training Data: Solutions to Copyright Compliance

DAI Wenyi, XIAO Dongmei   

  1. School of Intellectual Property, Xiangtan University, Xiangtan, 411105
  • Online:2025-01-10 Published:2025-03-19
  • Contact: Correspondence should be addressed to XIAO Dongmei, Email: 86650210@qq.com, ORCID:0000-0001-7611-2058
  • Supported by:
    This is an outcome of the Major Project "Research on Modernization of Industrial Intellectual Property Risk Governance under the Overall National Security Concept"(21&ZD204)supported by National Social Science Foundation of China.

Abstract: [Purpose/Significance] The copyright risks of generative artificial intelligence(AI)training data have currently drawn widespread attention. It is necessary to propose a copyright compliance scheme based on an in-depth analysis of specific risks to provide guidance for relevant obligated subjects. [Design/Methodology] Based on China's copyright-related system, following the judgment process of "whether it falls within the scope of copyright control - whether it constitutes fair use", this paper analyzes the usage behaviors and copyright risks across the stages of data input, model training and content output. [Findings/Conclusion] The finding indicates that there is no risk of copyright infringement during the model training stage. However, the data input stage carries risks related to infringement of the right of reproduction. During the content output stage, there are risks of infringement of the rights of reproduction, adaptation, integrity protection, information network dissemination and broadcasting. Therefore, the service providers who also act as model trainers must develop copyright compliance schemes for both works in the public domain and within the copyright protection period. Pure service providers need to prevent the continuation of risks during the model development stage and ensure copyright compliance during the service provision stage. [Originality/Value] This paper analyzes the copyright risks of associated with generative artificial intelligence training data in detail, and proposes some solutions for ensuring copyright compliance, providing copyright compliance guidance for enterprises and institutions to carry out model training and deliver external service activities.

Keywords: Generative artificial intelligence, Training data, Usage behavior, Copyright compliance