Damages Without Loss: Applying the User Principle to Large-Scale AI Text Ingestion

Post by Tahir Khan
December 9, 2025

Damages Without Loss: Applying the User Principle to Large-Scale AI Text Ingestion

Abstract

The user principle has long served as a compensatory mechanism in situations where a copyright claimant cannot establish traditional, causation-based financial loss. As generative AI systems increasingly train on vast corpora of copyrighted texts, this doctrinal tool has become central to contemporary litigation. This article traces the roots of the user principle, analyses leading authorities, and provides an expanded examination of its application to AI training practices. It argues that the courts are likely to adapt the doctrine into a structured, standardised, and potentially industry-wide valuation framework capable of addressing mass, data-driven infringements.

1. Introduction

The assessment of copyright damages in English law has historically required a nuanced balance between traditional compensatory principles and the evidential realities of infringement. Where the claimant cannot demonstrate quantifiable economic loss, but the defendant has nevertheless appropriated protected works, the courts have relied on the compensatory user principle (negotiating damages), derived from the reasoning in Wrotham Park Estate Co Ltd v Parkside Homes Ltd [1974] 1 WLR 798.

Under this principle, damages are assessed by determining what a reasonable licensee would have paid a reasonable licensor immediately before the infringement occurred. It enables courts to price the value of the lost opportunity to licence the rights. Crucially, it is not restitutionary; it does not seek to strip profits but to compensate for the inherent economic value of controlled exploitation.

The exponential rise of large-language models (LLMs) and generative AI has brought the user principle to the forefront of copyright litigation. AI developers gain commercial value from large-scale ingestion and tokenisation of copyrighted works, yet authors struggle to prove lost sales or quantifiable harm. In this environment, the user principle becomes both practical and principled.

This article examines the doctrinal foundations of the user principle, its development in jurisprudence, and most significantly, its contemporary application to AI training in a data-driven technological ecosystem.

2. Doctrinal Foundations of the User Principle:

2.1 Concept and Rationale

The user principle compensates the rightsholder by awarding damages equivalent to the fee that would have been agreed in a hypothetical negotiation. It focuses on the objective economic value of the use, not on any actual loss suffered. It is therefore applicable where the defendant utilises a work in a way that has economic value but leaves no direct evidence of displacement of sales.

Unlike damages based on actual loss (as in General Tire & Rubber Co v Firestone Tyre & Rubber Co Ltd [1976] RPC 197), and unlike accounts of profits under CDPA s 96(2), the user principle preserves its strictly compensatory nature. It also differs from additional damages under CDPA s 97(2), which address flagrant or reckless infringement.

2.2 Statutory Basis and Judicial Discretion

While not expressly stated in statute, the user principle falls comfortably within the discretionary framework of CDPA s 97(1), enabling courts to award damages “as the court considers just.” Judicial development has therefore shaped the doctrine, refining it into a sophisticated valuation tool.

3. Development of the User Principle Through Case Law:

3.1 Blayney v Clogau St David’s Gold Mines Ltd [2002] EWCA Civ 1007

The Court of Appeal confirmed the cross-applicability of user-principle valuation across the spectrum of intellectual property rights, emphasising the economic logic behind hypothetical-licence analysis.

3.2 Henderson (Jodie Aysha) v All Around the World Recordings Ltd [2014] EWHC 3087 (IPEC)

Here, the court established detailed valuation criteria including market comparators, royalty benchmarks, and exploitation patterns, now widely relied upon in entertainment and media disputes.

**3.3 Absolute Lofts South West Ltd v Artisan Home Improvements Ltd [2015] EWHC 2608 (IPEC)**

The court emphasised real-world market evidence, such as stock-image pricing and available licensing options, reinforcing the importance of objectively measurable benchmarks. Conduct-based adjustments and the potential for additional damages under s 97(2) were also highlighted.

3.4 Reformation Publishing Co Ltd v Cruiseco Ltd [2018] EWHC 2761 (Ch)

Nugee J offered one of the clearest expositions of hypothetical-licence construction, considering territorial reach, duration, and exploitation rights. This structured approach is particularly relevant for AI cases involving varied types of copying (e.g., reproduction, storage, tokenisation, transformation).

4. Constructing the Hypothetical Licence:

Courts typically consider:

Scope of use (full reproduction, extract use, storage, ingestion, embedding in AI systems).
Market comparators (collective licences, industry royalties, stock-content pricing).
Availability of lawful alternatives (particularly where alternatives are costly or impossible).
Defendant’s conduct, which may inform the “high end” of a reasonable licensing range and justify additional damages under CDPA s 97(2).

This flexible, evidence-based approach provides the scaffolding for application in AI training disputes.

5. Expanded Analysis: User Principle Damages in the Context of AI Training:

5.1 The Technological Nature of AI Copying

Modern generative AI requires large-scale ingestion and tokenisation of copyrighted works. These acts typically involve:

Full-text reproduction in training datasets.
Tokenisation and storage of text in persistent data structures.
Embedding representations that may retain expressive elements.
Repeated reproduction during model updates, fine-tuning, and inference processes.
Redistribution of datasets among research collaborators or commercial partners.

Each of these processes may constitute reproduction under UK copyright law.

5.2 Why Traditional Damages Are Inadequate in AI Cases

Most authors cannot show:

reduced sales,
reduced readership, or
specific commercial displacement.

AI training is usually non-consumptive: the defendant does not commercially distribute the text. Yet the economic benefit to the developer is substantial, training data contributes to model performance, product competitiveness, and ultimately revenue.

The traditional causation model, therefore, collapses.

5.3 The Fit Between AI Training and the User Principle

The user principle is precisely tailored for situations where:

infringement has clear economic value,
but actual economic loss is hard to quantify.

Courts can therefore:

identify the act of copying (full ingestion of a book, article, or dataset).
determine the commercial value of that act within the AI development pipeline.
construct a hypothetical licence covering the precise scope of use (one-time ingestion? repeated training? worldwide use? commercial exploitation? derivative model rights?).

This transforms an otherwise intangible harm into a clearly quantifiable one.

5.4 Key Factors Courts Are Likely to Consider

Drawing on authorities such as Henderson and Reformation Publishing, courts may assess:

Volume of ingestion (how many works, how many times, across how many training cycles).
Role of the work in model training (core data vs peripheral).
Market value of comparable licences, e.g., collective licensing bodies, dataset licensing, text-mining fees.
Availability of licensed alternatives, such as curated datasets.
Commercial scale of the AI model (consumer-facing vs research-focused).
Intensity and recurrence of copying, as models often undergo multiple retraining phases.

5.5 Toward Structured or Formulaic Damages

Given the scale of datasets, often hundreds of thousands of works, courts may adopt:

per-work valuation bands,
tiered fees based on work length,
genre-specific market rates,
collective or statutory licensing analogies,
class-wide allocation schemes like those used in mass IP litigation.

This could culminate in a standardised “AI Training Tariff,” particularly if endorsed by the Copyright Tribunal or future legislation.

5.6 Consequences for AI Developers

AI developers face increasing exposure to:

large, aggregated damages.
additional damages under CDPA s 97(2) for reckless disregard.
potential injunctions preventing further model training.
compulsory adoption of licensing frameworks.

The user principle therefore becomes not only a remedial mechanism but a regulatory force shaping industry practice.

6. Distribution Across Multiple Authors

Where thousands of works are ingested, the courts may employ:

administrative distribution schemes,
collective management body oversight,
pro-rata allocations, or
claims-resolution frameworks like group litigation orders.

These mechanisms reflect existing approaches in collective licensing and digital-rights management.

7. Conclusion

The user principle has matured into a central pillar of copyright damages in the United Kingdom. In the age of AI training, its compensatory logic is more relevant than ever. It offers a principled method for valuing mass, low visibility copying where traditional proof of loss breaks down. As AI litigation expands, English courts are likely to further refine the doctrine, potentially developing structured or tariff-based licensing models that could redefine the economics of AI development.

The result may be a hybrid system where judicial reasoning, collective licensing, and emerging AI regulation converge, anchored fundamentally by the compensatory rationale of the user principle.

Tags:

AI & Technology, Tahir Khan