DNYUZ
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Music
    • Movie
    • Television
    • Theater
    • Gaming
    • Sports
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel
No Result
View All Result
DNYUZ
No Result
View All Result
Home News

Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said

June 25, 2025
in News
Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said
493
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter
anthropic
Anthropic spent “many millions of dollars” buying used print books, then stripped off the bindings, cut the pages, and scanned them into digital files.

VCG/VCG via Getty Images

To build AI chatbot Claude, Anthropic “destructively scanned” millions of copyrighted books, wrote a judge on Monday.

Ruling in a closely-watched AI copyright case, Judge William Alsup of the Northern District of California analyzed how Anthropic sourced data for model training purposes, including from digital and physical books.

Companies like Anthropic require vast amounts of input to develop their large language models, so they’ve tapped sources from social media posts to videos to books. Authors, artists, publishers, and other groups contend that the use of their work for training amounts to theft.

Alsup detailed Anthropic’s training process with books: The OpenAI rival spent “many millions of dollars” buying used print books, which the company or its vendors then stripped of their bindings, cut the pages, and scanned into digital files.

Alsup wrote that millions of original books were then discarded, and the digital versions stored in an internal “research library.”

The judge also wrote that Anthropic, which is backed by Amazon and Alphabet, downloaded more than 7 million pirated books to train Claude.

Alsup wrote that Anthropic’s cofounder, Ben Mann, downloaded “at least 5 million copies of books from Library Genesis” in 2021 — fully aware that the material was pirated. A year later, the company “downloaded at least 2 million copies of books from the Pirate Library Mirror” also knowing they were pirated.

Alsup wrote that Anthropic preferred to “steal” books to “avoid ‘legal/practice/business slog,’ as cofounder and CEO Dario Amodei put it.”

Last year, a trio of authors sued Anthropic in a class-action lawsuit, saying that the company used pirated versions of their books without permission or compensation to train its large language models.

Judge says training Claude on books was fair use, but piracy wasn’t

Alsup ruled that Anthropic’s use of copyrighted books to train its AI models was “exceedingly transformative” and qualified as fair use, a legal doctrine that allows certain uses of copyrighted works without the copyright owner’s permission.

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” he wrote.

The company’s decision to digitize millions of print books it had purchased fell under fair use, Alsup wrote.

“All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies,” he wrote.

An Anthropic spokesperson said that the company is pleased with Alsup’s ruling on using books to train LLMs.

The spokesperson said in a statement that this approach is “consistent with copyright’s purpose in enabling creativity and fostering scientific progress.”

But Alsup drew a firm line when it came to piracy.

“Anthropic had no entitlement to use pirated copies for its central library,” Alsup wrote. “Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”

Judge Alsup’s ruling that training AI models on copyrighted books is fair use is one of the first of ips kind.

His decision comes amid a wave of lawsuits from artists, filmmakers, authors, and news outlets against major AI players like OpenAI.

While creators say training AI models on their copyrighted work without permission infringes on their rights, AI execs argue they haven’t violated copyright laws because the training falls under fair use.

Earlier this month, Disney sued AI image generator Midjourney, saying the tech company ripped off famous characters in properties ranging from “Star Wars” to “The Simpsons.”

The post Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said appeared first on Business Insider.

Share197Tweet123Share
Why the US just can’t quit Middle East wars
News

Why the US just can’t quit Middle East wars

by Vox
June 25, 2025

In April 1980, President Jimmy Carter authorized Operation Eagle Claw, an ill-fated military operation to rescue the American hostages held ...

Read more
News

NATO allies pledge 5% on defense to keep US on board

June 25, 2025
News

Did Baby Talk Give Rise to Language?

June 25, 2025
News

Before Same-Sex Marriage Was U.S. Law, They Said ‘I Do’ in Massachusetts

June 25, 2025
News

P. Adams Sitney, Leading Scholar of Avant-Garde Film, Dies at 80

June 25, 2025
The Edge, U2’s Guitarist, Becomes Irish Citizen After 62 Years There

The Edge, U2’s Guitarist, Becomes Irish Citizen After 62 Years There

June 25, 2025
Diddy prosecutors drop Kid Cudi-related arson and kidnapping from their racketeering case

Diddy prosecutors drop Kid Cudi-related arson and kidnapping from their racketeering case

June 25, 2025
‘Ironheart’ Release Schedule: When Do New Episodes Arrive?

‘Ironheart’ Release Schedule: When Do New Episodes Arrive?

June 25, 2025

Copyright © 2025.

No Result
View All Result
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Gaming
    • Music
    • Movie
    • Sports
    • Television
    • Theater
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel

Copyright © 2025.