图书情报知识 ›› 2025, Vol. 42 ›› Issue (1): 89-100.doi: 10.13366/j.dik.2025.01.089

• 学术聚焦(2)·生成式人工智能训练数据的合理使用 • 上一篇    下一篇

生成式人工智能训练数据的著作权法因应:著作权合规方案

戴文怡, 肖冬梅   

  1. 湘潭大学知识产权学院,湘潭,411105
  • 出版日期:2025-01-10 发布日期:2025-03-19
  • 通讯作者: 肖冬梅(ORCID: 0000-0001-7611-2058),博士,教授,研究方向:知识产权法、数据法,Email: 86650210@qq.com。
  • 作者简介:戴文怡(ORCID: 0000-0002-6913-2998),博士研究生,研究方向:知识产权法,数据法,Email: 785960194@qq.com。
  • 基金资助:
    本文系国家社会科学基金重大项目“总体国家安全观下产业知识产权风险治理现代化研究”(21&ZD204)的研究成果之一。

Legal Responses of Copyright Law to Generative Artificial Intelligence Training Data: Solutions to Copyright Compliance

DAI Wenyi, XIAO Dongmei   

  1. School of Intellectual Property, Xiangtan University, Xiangtan, 411105
  • Online:2025-01-10 Published:2025-03-19
  • Contact: Correspondence should be addressed to XIAO Dongmei, Email: 86650210@qq.com, ORCID:0000-0001-7611-2058
  • Supported by:
    This is an outcome of the Major Project "Research on Modernization of Industrial Intellectual Property Risk Governance under the Overall National Security Concept"(21&ZD204)supported by National Social Science Foundation of China.

摘要: [目的/意义]目前生成式人工智能训练数据的著作权风险颇受世人关注,有必要在深度解析具体风险的基础上提出著作权合规方案,为相关义务主体提供合规指引。[研究设计/方案]依据我国著作权相关制度,按照“是否落入著作权控制范围-是否构成合理使用”的判断流程,对数据输入、模型训练和内容输出阶段的使用行为及其著作权风险进行分析。[结论/发现]模型训练阶段不存在著作权侵权风险,数据输入阶段存在复制权侵权风险,内容输出阶段存在复制权、改编权、保护作品完整权、信息网络传播权、广播权等侵权风险。故服务提供者兼模型训练者需做好利用公共领域作品和著作权保护期限内作品的著作权合规;单纯的服务提供者则需做好模型开发阶段延续风险防范和提供服务阶段的著作权合规。[创新/价值]具体分析生成式人工智能训练数据的著作权风险,并提出生成式人工智能训练数据著作权合规的方案,为相关企业、机构等开展模型训练和对外服务活动提供著作权合规指引。

关键词: 生成式人工智能, 训练数据, 使用行为, 著作权合规

Abstract: [Purpose/Significance] The copyright risks of generative artificial intelligence(AI)training data have currently drawn widespread attention. It is necessary to propose a copyright compliance scheme based on an in-depth analysis of specific risks to provide guidance for relevant obligated subjects. [Design/Methodology] Based on China's copyright-related system, following the judgment process of "whether it falls within the scope of copyright control - whether it constitutes fair use", this paper analyzes the usage behaviors and copyright risks across the stages of data input, model training and content output. [Findings/Conclusion] The finding indicates that there is no risk of copyright infringement during the model training stage. However, the data input stage carries risks related to infringement of the right of reproduction. During the content output stage, there are risks of infringement of the rights of reproduction, adaptation, integrity protection, information network dissemination and broadcasting. Therefore, the service providers who also act as model trainers must develop copyright compliance schemes for both works in the public domain and within the copyright protection period. Pure service providers need to prevent the continuation of risks during the model development stage and ensure copyright compliance during the service provision stage. [Originality/Value] This paper analyzes the copyright risks of associated with generative artificial intelligence training data in detail, and proposes some solutions for ensuring copyright compliance, providing copyright compliance guidance for enterprises and institutions to carry out model training and deliver external service activities.

Keywords: Generative artificial intelligence, Training data, Usage behavior, Copyright compliance