- Home
- News
- Articles+
- Aerospace
- Artificial Intelligence
- Agriculture
- Alternate Dispute Resolution
- Arbitration & Mediation
- Banking and Finance
- Bankruptcy
- Book Review
- Bribery & Corruption
- Commercial Litigation
- Competition Law
- Conference Reports
- Consumer Products
- Contract
- Corporate Governance
- Corporate Law
- Covid-19
- Cryptocurrency
- Cybersecurity
- Data Protection
- Defence
- Digital Economy
- E-commerce
- Employment Law
- Energy and Natural Resources
- Entertainment and Sports Law
- Environmental Law
- Environmental, Social, and Governance
- Foreign Direct Investment
- Food and Beverage
- Gaming
- Health Care
- IBC Diaries
- In Focus
- Inclusion & Diversity
- Insurance Law
- Intellectual Property
- International Law
- IP & Tech Era
- Know the Law
- Labour Laws
- Law & Policy and Regulation
- Litigation
- Litigation Funding
- Manufacturing
- Mergers & Acquisitions
- NFTs
- Privacy
- Private Equity
- Project Finance
- Real Estate
- Risk and Compliance
- Student Corner
- Take On Board
- Tax
- Technology Media and Telecom
- Tributes
- Viewpoint
- Zoom In
- Law Firms
- In-House
- Rankings
- E-Magazine
- Legal Era TV
- Events
- Middle East
- Africa
- News
- Articles
- Aerospace
- Artificial Intelligence
- Agriculture
- Alternate Dispute Resolution
- Arbitration & Mediation
- Banking and Finance
- Bankruptcy
- Book Review
- Bribery & Corruption
- Commercial Litigation
- Competition Law
- Conference Reports
- Consumer Products
- Contract
- Corporate Governance
- Corporate Law
- Covid-19
- Cryptocurrency
- Cybersecurity
- Data Protection
- Defence
- Digital Economy
- E-commerce
- Employment Law
- Energy and Natural Resources
- Entertainment and Sports Law
- Environmental Law
- Environmental, Social, and Governance
- Foreign Direct Investment
- Food and Beverage
- Gaming
- Health Care
- IBC Diaries
- In Focus
- Inclusion & Diversity
- Insurance Law
- Intellectual Property
- International Law
- IP & Tech Era
- Know the Law
- Labour Laws
- Law & Policy and Regulation
- Litigation
- Litigation Funding
- Manufacturing
- Mergers & Acquisitions
- NFTs
- Privacy
- Private Equity
- Project Finance
- Real Estate
- Risk and Compliance
- Student Corner
- Take On Board
- Tax
- Technology Media and Telecom
- Tributes
- Viewpoint
- Zoom In
- Law Firms
- In-House
- Rankings
- E-Magazine
- Legal Era TV
- Events
- Middle East
- Africa
AI Training or Copyright Infringement: Drawing Parallels between US and Indian Perspective
AI Training or Copyright Infringement: Drawing Parallels between US and Indian Perspective
AI Training or Copyright Infringement: Drawing Parallels between US and Indian Perspective
Introduction
The old adage of Sir Late Martin Luther King “When God builds a church, the devil builds a chapel,” finds a new echo in the age of Artificial Intelligence (AI). AI is not just a technological leap; it is a seismic shift in how creativity and originality are consumed and repurposed. As AI powers everything from legal research and journalism to music composition and entertainment, it thrives on vast datasets filled with content and sometimes with copyrighted content. Yet behind this technological marvel lies a quiet erosion of originality especially when models are trained on copyrighted works, without license, blurring the lines between creativity and infringement.
Recently, this tension has reached a legal flashpoint in the US case of Thomson Reuters Enterprise Centre GmbH & West Publishing Corp. v. Ross Intelligence Inc.1, wherein the Hon’ble United States District Court for the District of Delaware held that Ross’s AI tool copies Westlaw’s editorial headnotes to train its search engine. The ruling signals a paradigm shift, setting boundaries on how far AI can go in consuming copyrighted materials without authorization.
India, too, stands at the cusp of this legal crossroad. As AI adoption is accelerating and legal frameworks are still adapting, similar questions loom large: when creativity feeds on protected works, where should the law draw the line? However, before delving into the case, it is essential to understand how AI Training of datasets works.
AI Training and Copyright: The Legal Crossroads
Artificial Intelligence (AI) models rely heavily on enormous datasets, much of which include copyrighted content. Most of the time Developers use these datasets which contain copyrighted material without obtaining permission from copyright holders to train the AI models. This practice has sparked growing demands for greater transparency, especially the disclosure of the data sources used in AI training.2
The use of copyright content in machine learning
Most of the material used to train generative AI models is protected by copyright law. The use of copyrighted content in AI training raises several legal issues. Typically, AI models do not store the data they train on in the way a file is saved to a hard drive. Instead, they convert the information into abstract numerical patterns. Because of this, many scholars argue that simply training a model does not amount to copyright infringement. But this view becomes more complicated due to "memorization" the ability of AI models to reproduce exact or nearly exact portions of their training data. If a model can output part of a work it was trained on, a court might consider that the work is effectively “contained” in the model, even if not saved in a traditional sense.3
Crucially, in order to train a model, the data must be copied at least once during the process. This act of reproduction is, on its face, a potential violation of copyright law. However, obtaining that permission is extremely difficult. It is not just about the cost of licenses; the process of tracking down every rightsholder and negotiating terms would be practically impossible. To get around this, most AI companies have simply used copyrighted material without asking. This unauthorized copying is at the heart of many of the legal cases currently being brought against AI developers.4
Thompson Reuters v. Ross Intelligence: A turning Point
Background
Thomson Reuters owns Westlaw, one of the largest legal research platforms, which operates on a paid subscription model. In addition to primary legal materials, Westlaw includes editorial content and annotations, such as headnotes that summarize key legal points and case holdings. The platform organizes its content by using a structured, numerical taxonomy known as the Key Number System. Thomson Reuters owns copyrights in Westlaw’s copyrightable material.
Ross, a new competitor to Westlaw, developed a legal research search engine powered by artificial intelligence. To train its AI tool, Ross needed a large dataset of legal questions and answers. Initially, Ross sought to license content from Westlaw, but Thomson Reuters declined the license request as Ross was a competitor.
To obtain training data, Ross partnered with a company called LegalEase, which provided it with approximately 25,000 “Bulk Memos.” These memos were compilations of legal questions along with both good and bad answers. LegalEase had instructed the contributing lawyers to use Westlaw headnotes as a reference when creating the questions, but advised them not to copy the headnotes directly.
Ross used the Bulk Memos to train its AI search engine. In other words, Ross’s product was developed using materials derived from Westlaw headnotes. When Thomson Reuters discovered this, it filed a lawsuit against Ross for copyright infringement.
Decision by the Hon’ble Court
Originality of Headnotes and Key Number System
1. The headnotes are original
The Hon’ble Court was of the opinion that while judicial opinions themselves are not protected by copyright, headnotes involve editorial judgment of summarizing, selecting, and organizing legal points, which adds originality. The court found that the headnotes qualify both as protected compilations and as individual, copyrightable works and held that even headnotes using verbatim text from opinions reflect creative choices.
2. The Key Number System is original
The court determined that there is no real dispute about the originality of Westlaw’s Key Number System. Even though much of its organization was done by a computer and the topics follow common legal subjects, the system still meets the basic level of creativity required for copyright.
Ross’s defences to copyright infringement fail
1. Innocent Infringement: Ross contended that any infringement was innocent. While both parties agreed that innocent infringement did not eliminate liability but might reduce damages, this reduction was not available because the copyrighted work carried a proper copyright notice, as Westlaw’s headnotes did.
2. Copyright Misuse: Ross claimed that Thomson Reuters misused its copyright. This kind of misuse usually meant using copyright to unfairly hurt competition or the public interest. However, there was no proof that Thomson Reuters abused its copyrights to block competitors.
3. Merger Defence: The Hon’ble Court rejected Ross’s merger defence, which argued that the ideas were so closely tied to their expression that they could not be copyrighted.
4. Scenes à faire: The Hon’ble Court also dismissed the scenes à faire defence, which covers common elements that naturally arise from the subject matter. The court held that nothing about judicial opinions requires them to be condensed into Westlaw’s headnotes or organized by the Key Number System
Fair Use Analysis by the Hon’ble Court
Fair use is a four-factor test under U.S. copyright law, and the court methodically applied each factor to Ross’s conduct
1st factor—Purpose and character of the use: Although Ross claimed that the headnotes were used only at an intermediate stage for AI training and not exposed to the end user, the court emphasized that the use was commercial and not transformative.
2nd factor—Nature of the copyrighted work: The headnotes and Key Number System were only minimally creative. While this supported copyright protection, it also meant the material was not highly expressive, so this factor slightly favored Ross.
3rd factor—Amount and substantiality of the portion used: Ross’s end product delivered judicial opinions to users and not headnotes; the court found that the public did not gain direct access to the copyrighted material. Thus, the heart of the work was not exposed to the market.
4th factor—Effect of the use on the potential market: Ross’s use competed directly in the same market of legal research tools. More importantly, the court found that Ross’s use could also harm a potential derivative market by either selling headnotes or using them to train legal AI tools.
India’s Copyright Landscape and AI Training
As AI increasingly becomes the engine behind everything from legal research and journalism to music and creative writing, India’s legal framework is being tested. While Indian copyright law offers fair‑dealing exceptions under Section 52 of the Copyright Act, 1957, the hon’ble courts are yet to explicitly assess whether AI training on copyrighted materials, especially for commercial tools, falls within these provisions.
In one of the earlier landmark judgments of Eastern Book Company & Ors vs D.B. Modak & Anr,5 the Appellants, Eastern Book Company and its associates, published their well-known law report, Supreme Court Cases (SCC). The Appellants had copy-edited the judgments using a dedicated team of associates. The editing process had included adding paragraph numbers, cross-references, standardized formatting, and verification of facts and citations. These enhancements were aimed at making the judgments more accessible, user-friendly, and research-oriented for legal professionals and scholars. Additionally, the Appellants had also prepared headnotes for each judgment.
According to the Appellants, crafting these headnotes and enhancing the raw judgments had required a high level of skill, labour, and expertise. Based on these contributions, the Appellants claimed that the final version of SCC represented their own original literary work, qualifying for protection under Sections 13 and 14 of the Copyright Act, 1957.
In 2004, Respondent 1 and Respondent 2 had launched CD-ROM-based legal software. The Appellants alleged that these software products had copied substantial portions of SCC, including not just the judgments but also the structure, style, and editorial elements. They claimed that the Respondents had reproduced SCC’s selection and arrangement of cases, copy-edited text, paragraph and footnote numbering, and cross-references essentially lifting the content verbatim for use in their own products amounting to a direct violation of their intellectual property rights, as the Respondents had exploited their editorial labour and original contributions without permission.
The Hon’ble Supreme Court, while determining the “standards of originality,” stated that the work in question had to be a product of the author’s skill and judgement, and that the exercise of skill and judgement should not be so trivial as to amount to a purely mechanical exercise. The Court eventually held that the human skill and judgement involved in creating the additional elements had required legal knowledge, skill, and the author’s judgement, and thus the SCC version of judgments by EBC was held to be copyrightable
Similarly, in late 2024, in the case of ANI Media Pvt Ltd v OpenAI Inc6 ANI filed India’s first major AI copyright suit against OpenAI, alleging unauthorized use of its news content to train ChatGPT’s models.
ANI sues OpenAI for Copyright Infringement
Background
ANI Media Pvt Ltd (Plaintiff) approached the Hon’ble Delhi High Court and filed India’s first AI copyright infringement lawsuit against OpenAI, (Defendant).
In its suit, the Plaintiff claims that the Defendant is using its copyrighted news content to train its AI model ChatGPT for commercial purposes, without any authorization or consent from the Plaintiff. The Plaintiff further alleged that the Defendant’s AI model ChatGPT reproduces its original content in responses to user queries, amounting to copyright infringement and further provided that ChatGPT sometimes provides false information while citing Plaintiff as the source. According to the Plaintiff, this misattribution could damage its reputation and contribute to the spread of misinformation, potentially leading to public disorder.7
Issues Framed
The following issues have been framed before the Hon’ble Court for adjudication –
(i) Whether storing copyrighted data for training ChatGPT amounts to copyright infringement;
(ii) Whether generating user responses using copyrighted data constitutes infringement;
(iii) Whether this use falls under ‘fair use’ as per Section 52 of the Copyright Act; and
(iv) Whether Indian courts have jurisdiction over this matter, given that OpenAI’s servers are based abroad.
While the matter is currently pending before the Hon’ble Court, this is not the first time that issues concerning AI training datasets and copyright infringement have come under judicial scrutiny. Cases such as Thomson Reuters Enterprise Centre GmbH & West Publishing Corp. v. Ross Intelligence Inc. in the United States and ANI Media Pvt Ltd v. OpenAI Inc. in India mark significant milestones in this evolving legal landscape.
These cases join a growing list of high-profile lawsuits globally, including The New York Times v. OpenAI and Microsoft, and Silverman et al. v. Meta Platforms Inc, where courts are being called upon to define the boundaries of fair use, originality, and liability in the context of the use of copyrighted materials in AI Training Datasets.
Conclusion
As legal systems worldwide grapple with these questions, the Ross Intelligence judgment stands out as a detailed and reasoned precedent. By closely examining the use of copyrighted editorial content in AI model training and applying traditional copyright principles to a novel technological context, the decision delivered in the case of Thomson Reuters Enterprise Centre GmbH & West Publishing Corp. v. Ross Intelligence Inc., offers a foundational perspective. It may serve as a reference point for courts and policymakers globally as they seek to strike a balance between innovation in AI and the protection of intellectual property rights.
2. Adam Buick, Copyright and AI training data—transparency to the rescue?, Journal of Intellectual Property Law & Practice, Volume 20, Issue 3, March 2025, Pages 182–192
https://academic.oup.com/jiplp/issue/20/3?login=false
3. Adam Buick, Copyright and AI training data—transparency to the rescue?, Journal of Intellectual Property Law & Practice, Volume 20, Issue 3, March 2025, Pages 182–192
https://academic.oup.com/jiplp/issue/20/3?login=false
4. Adam Buick, Copyright and AI training data—transparency to the rescue?, Journal of Intellectual Property Law & Practice, Volume 20, Issue 3, March 2025, Pages 182–192
https://academic.oup.com/jiplp/issue/20/3?login=false
5. AIR 2008 SUPREME COURT 809
6. (CS COMM 1028/2024)
7. https://www.barandbench.com/news/delhi-high-court-issues-summons-openai-ani-alleges-copyright-violation-chatgpt


