生成式人工智能训练数据的著作权法因应：确需设置合理使用规则吗？

doi:10.13366/j.dik.2025.01.078

图书情报知识 ›› 2025, Vol. 42 ›› Issue (1): 78-88.doi: 10.13366/j.dik.2025.01.078

• 学术聚焦(2)·生成式人工智能训练数据的合理使用 • 上一篇下一篇

生成式人工智能训练数据的著作权法因应：确需设置合理使用规则吗？

魏远山

广东外语外贸大学法学院，广州，510006

出版日期:2025-01-10 发布日期:2025-03-19
通讯作者: 魏远山（ORCID：0000-0002-7474-9423），博士，讲师，研究方向：知识产权法、数据法，Email:wys_victory@163.com。
基金资助:
本文系国家社会科学基金重大项目“总体国家安全观下产业知识产权风险治理现代化研究”（21&ZD204）的研究成果之一。

Copyright Law Response to Generative Artificial Intelligence Training Data: Is It Necessary to Set Fair Use Rules?

WEI Yuanshan

School of Law, Guangdong University of Foreign Studies, Guangzhou, 510006

Online:2025-01-10 Published:2025-03-19
Contact: Correspondence should be addressed to WEI Yuanshan, Email:wys_vicotry@163.com, ORCID:0000-0002-7474-9423.
Supported by:
This is an outcome of the Major Project "Research on Modernization of Industrial Intellectual Property Risk Governance under the Overall National Security Concept"（21&ZD204）supported by National Social Science Foundation of China.

摘要/Abstract

摘要： [目的/意义]生成式人工智能（GenAI）训练数据包含大量尚处保护期的作品，明确是否应为机器学习设置合理使用规则，有助于化解GenAI训练数据的著作权法争议。[研究设计/方法]以类型化视角审视表达型和非表达型机器学习，以是否符合“未经许可利用作品训练GenAI构成侵权→遵循授权使用规则阻碍技术进步→其他简化授权机制无法适用”的逻辑来确定是否设置合理使用规则。[结论/发现]作为非表达型机器学习的输入和训练阶段是“非作品性使用”，因不构成侵权自然无为其设置合理使用规则的必要；作为表达型机器学习的输出阶段是“作品性使用”，但因GenAI向公众开放前后所处场景有异，应作类型化分析。在未向公众开放时，GenAI输出结果主要用于验证模型训练情况，可被定性为合理使用；在向公众开放后，若输出结果对作品表达改动幅度超越改编行为范畴则属正当使用，反之则构成侵权。因输出阶段构成侵权不会阻碍GenAI技术发展，故无需为其设置合理使用规则。[创新/价值]与既有研究不同的分析方法和研究结论，对AI从业者和法律工作者探讨GenAI训练数据的著作权法问题具有启示作用，也对丰富和深化现有研究有益。

关键词: 生成式人工智能, 机器学习, 训练数据, 著作权, 合理使用, 作品性使用

Abstract: [Purpose/Significance] Training data for Generative Artificial Intelligence（GenAI）contains many works that are still under copyright protection. It is helpful to resolve the dispute of copyright law of GenAI training data to clear whether the fair use rules should be set for machine learning. [Design/Methodology] This study examines the expressive and non-expressive machine learning from typological perspective and determines whether fair use rules should be established by the logic that "unauthorized use of works to training GenAI constitutes infringement, following with licensed use rules hinders technological progress, and other simplified authorizing mechanisms are not applicable". [Findings/Conclusion] The input and training stages of non-expressive machine learning, conform to "use on work", there is no need to set fair use rules as it does not constitute infringement. The output stage of expressive machine learning, referred to as "use of work", should be subject to discuss separately because of the difference scenes before and after GenAI is opened to the public. Prior to its public release, the output results of GenAI are mainly used for model verification about the training situation, which can be considered fair use. However, once it is made available to the public, any output that alters the expression of the original work beyond the scope of adaptation is considered to proper use otherwise constitutes infringement. Since the infringement at output phase does not hinder GenAI technology development, there is no need to set specific fair use rules for it. [Originality/Value] Compared with the previous research, this study has different analysis methods and conclusions, which is instructive for AI practitioners and legal workers in exploring copyright law related to GenAI training data, as well as enriching and deepening existing research.

Keywords: Generative Artificial Intelligence（GenAI）, Machine learning, Training data, Copyright, Fair use, Use of work

魏远山. 生成式人工智能训练数据的著作权法因应：确需设置合理使用规则吗？[J]. 图书情报知识, 2025, 42(1): 78-88.

WEI Yuanshan. Copyright Law Response to Generative Artificial Intelligence Training Data: Is It Necessary to Set Fair Use Rules?[J]. Documentation, Informaiton & Knowledge, 2025, 42(1): 78-88.

[1]	王俊, 谢青伶, 刘畅. 日常生活情境下用户与生成式人工智能交互行为分析[J]. 图书情报知识, 2025, 42(2): 60-69, 93.
[2]	吴欣雨, 吴振新. 人工智能在数字资源长期保存领域应用进展述评[J]. 图书情报知识, 2025, 42(1): 146-157.
[3]	戴文怡, 肖冬梅. 生成式人工智能训练数据的著作权法因应：著作权合规方案[J]. 图书情报知识, 2025, 42(1): 89-100.
[4]	邓胜利, 汪璠, 王浩伟. 在线社区中人工智能生成内容的识别方法研究[J]. 图书情报知识, 2024, 41(2): 28-38,149.
[5]	张新新, 黄如花. 生成式智能出版的应用场景、风险挑战与调治路径[J]. 图书情报知识, 2023, 40(5): 77-86,27.
[6]	张新新, 丁靖佳. 生成式智能出版的技术原理与流程革新[J]. 图书情报知识, 2023, 40(5): 68-76.
[7]	朱禹, 陈关泽, 陆泳溶, 樊伟. 生成式人工智能治理行动框架：基于AIGC事故报道文本的内容分析[J]. 图书情报知识, 2023, 40(4): 41-51.
[8]	颜嘉麒, 王佳鑫, 毛谦昂, 严丹妮. 加密数字货币恐怖融资监管：交易模式分析与异常实体识别[J]. 图书情报知识, 2022, 39(6): 55-66.
[9]	范昊, 李珊珊, 热孜亚 • 艾海提. 机器学习算法在我国情报学研究中的应用与影响——基于CSSCI期刊论文的视角[J]. 图书情报知识, 2022, 39(5): 96-108.
[10]	丁恒, 阮靖龙. 基于算法归因框架的LIS领域学者施引影响因素实证研究[J]. 图书情报知识, 2022, 39(2): 83-97.
[11]	王林旭, 严承希. 情报学领域人工智能相关研究的文献计量分析及探析[J]. 图书情报知识, 2020, 0(1): 53-62.
[12]	梁少博，吴丹，徐惟佳. 面向数字图书馆和档案馆的信息基础设施与机器学习：数据管理、分析与出版的融合[J]. 图书情报知识, 2018, 0(5): 72-80.
[13]	傅平，邹小筑，吴丹，叶志锋. 回顾与展望：人工智能在图书馆的应用[J]. 图书情报知识, 2018, 0(2): 50-60.
[14]	黄辉. 基于多元化视角的数字图书馆版权协调模式研究[J]. 图书情报知识, 2015, 0(2): 100-105.
[15]	黄先蓉，王晓悦. 从谷歌图书馆计划谈数字出版法律法规存在的问题及解决思路[J]. 图书情报知识, 2015, 0(1): 124-128.

生成式人工智能训练数据的著作权法因应：确需设置合理使用规则吗？

Copyright Law Response to Generative Artificial Intelligence Training Data: Is It Necessary to Set Fair Use Rules?

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价