The New Copyright Challenge in the AI Era: Does AI Creation Infringe Copyright?
In the field of digital creativity, artificial intelligence (AI) and its application in generative AI have ushered in a new era, challenging the boundaries of traditional copyright law.
Contrary to popular belief, the machine learning process of AI is not similar to the typical reproduction or imitation by infringers. Instead, AI’s machine learning is more akin to the learning process of humans, where it learns the logic and sequencing of colors, language, and pattern arrangement through exposure to existing works. It then uses this knowledge to create new works that are distinct and not substantially similar to the learning materials.
This article aims to delve into the complexity of AI machine learning and clarify its legal implications by citing recent important cases to correct common misconceptions.
How does AI generate images and text?
The basic principle of generative AI is to create new content by learning from data and information. Unlike typical copyright infringement patterns such as “distribution,” “reproduction,” and “adaptation,” AI learning involves “learning” from existing works and data.
Image Generation
1. Data Analysis and Pattern Learning:
For AI used in image creation, it begins by analyzing a vast amount of image data. This includes identifying objects in images and understanding deeper elements such as brush textures, color gradients, lighting, and spatial relationships. For example, when a mature AI learns to create landscape paintings, it applies elements such as brushstrokes, color blending techniques, and the interplay of light and shadows after recognizing different elements.
2. Feature Extraction:
Convolutional neural networks (CNNs) in AI algorithms can extract specific features from image works, recognizing and separating various elements such as edges, shapes, and textures. “Feature extraction” is crucial for AI robots to understand different artistic styles, brushstrokes, and techniques.
3. Generation of New Works:
Once AI has learned specific techniques and art styles through feature extraction and data analysis, it can generate new images. This is often done using Generative Adversarial Networks (GANs). GANs consist of a generator and a discriminator, which interact and iterate to ultimately produce images that resemble the style and features of the training data (often copyrighted works). However, these generated images do not have substantial similarity to the training data when compared side by side.
Text Generation
1. Data Acquisition and Language Model Building:
For text generation, AI models like ChatGPT absorb a large amount of text data from various sources such as books, articles, website content, and even conversation records. AI constructs a language model that understands grammar and infers context from the text data.
2. Language Prediction:
The most common language prediction model in text generation AI is n-gram, which calculates the probability of words or phrases following specific words to achieve idiomatic expressions, narrative structures, and subject-object agreement. However, the n-gram language prediction model is limited in handling more complex text generation tasks.
3. Encoding and Text Understanding:
For complex tasks like extending context and generating entire texts, the n-gram model falls short as it can only predict based on limited context information without understanding the semantic meaning of the text. In contrast, the Transformer model uses self-attention mechanisms to convert text into vectors (input embedding) and incorporates positional encoding to include information about the order of words, achieving comprehensive understanding of the text.
4. Text Generation:
After deep understanding of the text through the encoder, the decoder is responsible for generating text based on the learned text features. This process captures long-distance dependencies between words effectively. The characteristics of the Transformer model allow it to generate coherent and creative text that is not only based on a deep understanding of the original text’s semantic meaning but also exhibits logical coherence and originality in its content.
How is AI generation different from copyright infringement?
From the principles of image and text generation mentioned above, it is evident that the way AI generates content is significantly different from the infringement patterns defined in copyright law. This difference is particularly evident in the following points:
1. Creative Nature of AI:
Generative AI does not simply “copy” or “reproduce” the data it learns (existing works). Instead, it learns the underlying logic, structure, and style of texts and images from a vast amount of data and uses these elements to create new works with novelty. For example, in image generation, AI may learn from existing artworks, but the resulting image is not a replica or reproduction of any specific existing work. Instead, it is a new creation that recombines and interprets the deep learning outcomes.
2. Legal Interpretation:
From a legal perspective, there is a significant distinction between AI-generated content and human reproduction. The fundamental concept of copyright law is to “protect the expression of ideas, not the ideas, concepts, or systems themselves.”
As mentioned above, AI-generated works learn the underlying logic, structure, artistic style, brushstrokes, etc., from training data (original works) and are not intended to “reproduce” or “replicate” the expression of the training data (original works).
The way AI generates works challenges the boundaries of traditional copyright infringement. In the case of “Andersen v. Stability AI Ltd,” for example, the key legal argument was whether Stable Diffusion’s generated images constituted infringement when it used copyrighted images for AI training purposes.
3. Transformation and Fair Use:
When discussing whether AI-generated works constitute infringement, the question of substantial transformation arises. This refers to the addition of extra expression or even the creation of new meanings based on the original works. At this point, the possibility of “fair use” comes into play.
This depends on the AI’s ability to create works that are significantly different from the original works. Currently, DALL-E, for example, completely prohibits providing AI with the ability to adapt existing works to avoid such legal disputes.
Controversies surrounding the use of AI to adapt existing works reached its peak with the recent global phenomenon of “Palworld,” where generative AI was used to adapt multiple Pokémon, including their fusion. In the case of Thomson Reuters v. Ross Intelligence, discussions on whether AI-generated legal briefs constituted fair use were deeply explored, leading to a positive conclusion regarding “fair use.”
The Impact of Generative AI on Copyright Law
The process of AI-generated content, as showcased in image and text backgrounds, demonstrates a creative form that is different from direct copying or reproduction. Understanding this distinction is crucial for comprehending why AI’s learning and generation methods differ from copyright infringement patterns.
As AI continues to evolve, existing copyright laws and interpretations must be revised and developed accordingly. The creative patterns faced by copyright law are constantly changing as AI technology continues to advance and update.
The distinction between “AI learning and generation patterns” and “copyright infringement patterns” and the difference between “human learning of ideas and concepts” are not only significant in terms of definition but also involve profound legislative logic and interpretations of creativity ethics.
With the advancement of AI technology, the existing legal framework will inevitably undergo revisions. However, the direction of corresponding legal amendments depends on how lawmakers balance the “AI innovation potential” and “protection of original works.”
Therefore, the next time you encounter a debate on whether generative AI constitutes copyright infringement, remember that it is a clash between two values and refrain from hastily concluding that “generative AI infringes on the copyright of the original works.”
Opinion articles present diverse views and do not represent the position of “WEB3+.”