图书情报知识 ›› 2025, Vol. 42 ›› Issue (1): 78-88.doi: 10.13366/j.dik.2025.01.078

• 学术聚焦(2)·生成式人工智能训练数据的合理使用 • 上一篇    下一篇

生成式人工智能训练数据的著作权法因应:确需设置合理使用规则吗?

魏远山   

  1. 广东外语外贸大学法学院,广州,510006
  • 出版日期:2025-01-10 发布日期:2025-03-19
  • 通讯作者: 魏远山(ORCID:0000-0002-7474-9423),博士,讲师,研究方向:知识产权法、数据法,Email:wys_victory@163.com。
  • 基金资助:
    本文系国家社会科学基金重大项目“总体国家安全观下产业知识产权风险治理现代化研究”(21&ZD204)的研究成果之一。

Copyright Law Response to Generative Artificial Intelligence Training Data: Is It Necessary to Set Fair Use Rules?

WEI Yuanshan   

  1. School of Law, Guangdong University of Foreign Studies, Guangzhou, 510006
  • Online:2025-01-10 Published:2025-03-19
  • Contact: Correspondence should be addressed to WEI Yuanshan, Email:wys_vicotry@163.com, ORCID:0000-0002-7474-9423.
  • Supported by:
    This is an outcome of the Major Project "Research on Modernization of Industrial Intellectual Property Risk Governance under the Overall National Security Concept"(21&ZD204)supported by National Social Science Foundation of China.

摘要: [目的/意义]生成式人工智能(GenAI)训练数据包含大量尚处保护期的作品,明确是否应为机器学习设置合理使用规则,有助于化解GenAI训练数据的著作权法争议。[研究设计/方法]以类型化视角审视表达型和非表达型机器学习,以是否符合“未经许可利用作品训练GenAI构成侵权→遵循授权使用规则阻碍技术进步→其他简化授权机制无法适用”的逻辑来确定是否设置合理使用规则。[结论/发现]作为非表达型机器学习的输入和训练阶段是“非作品性使用”,因不构成侵权自然无为其设置合理使用规则的必要;作为表达型机器学习的输出阶段是“作品性使用”,但因GenAI向公众开放前后所处场景有异,应作类型化分析。在未向公众开放时,GenAI输出结果主要用于验证模型训练情况,可被定性为合理使用;在向公众开放后,若输出结果对作品表达改动幅度超越改编行为范畴则属正当使用,反之则构成侵权。因输出阶段构成侵权不会阻碍GenAI技术发展,故无需为其设置合理使用规则。[创新/价值]与既有研究不同的分析方法和研究结论,对AI从业者和法律工作者探讨GenAI训练数据的著作权法问题具有启示作用,也对丰富和深化现有研究有益。

关键词: 生成式人工智能, 机器学习, 训练数据, 著作权, 合理使用, 作品性使用

Abstract: [Purpose/Significance] Training data for Generative Artificial Intelligence(GenAI)contains many works that are still under copyright protection. It is helpful to resolve the dispute of copyright law of GenAI training data to clear whether the fair use rules should be set for machine learning. [Design/Methodology] This study examines the expressive and non-expressive machine learning from typological perspective and determines whether fair use rules should be established by the logic that "unauthorized use of works to training GenAI constitutes infringement, following with licensed use rules hinders technological progress, and other simplified authorizing mechanisms are not applicable". [Findings/Conclusion] The input and training stages of non-expressive machine learning, conform to "use on work", there is no need to set fair use rules as it does not constitute infringement. The output stage of expressive machine learning, referred to as "use of work", should be subject to discuss separately because of the difference scenes before and after GenAI is opened to the public. Prior to its public release, the output results of GenAI are mainly used for model verification about the training situation, which can be considered fair use. However, once it is made available to the public, any output that alters the expression of the original work beyond the scope of adaptation is considered to proper use otherwise constitutes infringement. Since the infringement at output phase does not hinder GenAI technology development, there is no need to set specific fair use rules for it. [Originality/Value] Compared with the previous research, this study has different analysis methods and conclusions, which is instructive for AI practitioners and legal workers in exploring copyright law related to GenAI training data, as well as enriching and deepening existing research.

Keywords: Generative Artificial Intelligence(GenAI), Machine learning, Training data, Copyright, Fair use, Use of work