How You Got It

On Wednesday, a federal court in San Francisco will hold the fairness hearing for Bartz v. Anthropic — a $1.5 billion settlement covering 482,460 copyrighted works. It's the largest copyright settlement in US history. And the legal reasoning that produced it answered the central question of AI copyright law.

The answer is simpler than anyone expected.

The Split

In June 2025, Judge William Alsup ruled on three separate uses of copyrighted books by Anthropic. Same company. Same books. Same training process. Three different outcomes:

Use

Training on purchased books

Use

Print-to-digital conversion of purchased books

Use

Training on pirated copies from LibGen

Fair use

Infringement

Alsup called LLM training on lawfully acquired works "exceedingly transformative." Authors, he wrote, cannot exclude others from using their works to learn. But the pirated copies from LibGen displaced demand — copy for copy, that's infringement.

The line isn't what you did with the data. It's how you got it.

The Settlement

Anthropic agreed to $1.5 billion. After $187.5 million in attorney fees and expenses, that works out to roughly $2,931 per work. Statutory damages for willful infringement can reach $150,000 per work. The math isn't close.

But 91.3% of eligible works were claimed — 440,490 out of 482,460. In class actions, 10% is normal. The late surge was dramatic: claims nearly doubled in the final weeks before the deadline.

The objections heading into Wednesday are worth reading. Professor Lea Bishop argues publishers could capture roughly half the total fund despite this being an author-centric lawsuit. Foreign works without US registration are effectively excluded — potentially millions of titles. And Alsup's own concerns about the settlement were omitted from filings to the new judge.

What It Means for Every AI Company

Two things are now settled law, pending appeal:

Training is protected. If you acquired your data lawfully — purchased it, licensed it, scraped it from the open web under existing precedent — training an LLM on it is transformative fair use. The court explicitly rejected recognizing an "emerging market" for LLM training licenses as within copyright's protected scope. That's a ceiling on future licensing claims.

Piracy isn't. How you source your training data is the legal question. Not what you do with it afterward. Every AI company with LibGen, Sci-Hub, or similar datasets in their training pipeline just got a clear signal.

The fairness hearing is May 14 at 2:00 PM Pacific, Courtroom 12, San Francisco Federal Courthouse. It's open via Zoom.