Documentation, Informaiton & Knowledge ›› 2025, Vol. 42 ›› Issue (1): 78-88.doi: 10.13366/j.dik.2025.01.078

• Academic Focus (2) : Reasonable Use of Training Data for Generative Artificial Intelligence • Previous Articles     Next Articles

Copyright Law Response to Generative Artificial Intelligence Training Data: Is It Necessary to Set Fair Use Rules?

WEI Yuanshan   

  1. School of Law, Guangdong University of Foreign Studies, Guangzhou, 510006
  • Online:2025-01-10 Published:2025-03-19
  • Contact: Correspondence should be addressed to WEI Yuanshan, Email:wys_vicotry@163.com, ORCID:0000-0002-7474-9423.
  • Supported by:
    This is an outcome of the Major Project "Research on Modernization of Industrial Intellectual Property Risk Governance under the Overall National Security Concept"(21&ZD204)supported by National Social Science Foundation of China.

Abstract: [Purpose/Significance] Training data for Generative Artificial Intelligence(GenAI)contains many works that are still under copyright protection. It is helpful to resolve the dispute of copyright law of GenAI training data to clear whether the fair use rules should be set for machine learning. [Design/Methodology] This study examines the expressive and non-expressive machine learning from typological perspective and determines whether fair use rules should be established by the logic that "unauthorized use of works to training GenAI constitutes infringement, following with licensed use rules hinders technological progress, and other simplified authorizing mechanisms are not applicable". [Findings/Conclusion] The input and training stages of non-expressive machine learning, conform to "use on work", there is no need to set fair use rules as it does not constitute infringement. The output stage of expressive machine learning, referred to as "use of work", should be subject to discuss separately because of the difference scenes before and after GenAI is opened to the public. Prior to its public release, the output results of GenAI are mainly used for model verification about the training situation, which can be considered fair use. However, once it is made available to the public, any output that alters the expression of the original work beyond the scope of adaptation is considered to proper use otherwise constitutes infringement. Since the infringement at output phase does not hinder GenAI technology development, there is no need to set specific fair use rules for it. [Originality/Value] Compared with the previous research, this study has different analysis methods and conclusions, which is instructive for AI practitioners and legal workers in exploring copyright law related to GenAI training data, as well as enriching and deepening existing research.

Keywords: Generative Artificial Intelligence(GenAI), Machine learning, Training data, Copyright, Fair use, Use of work