A US judge rules Anthropic's use of copyrighted books to train AI was fair use, but its storage of pirated books in a central library for training LLMs was not
but it's still in trouble for stealing books Blake Brittain / Reuters : Anthropic wins key US ruling on AI training in authors' copyright lawsuit Jason Koebler / 404 Media : Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not Rashi Shrivastava / Forbes : The Prompt: A Copyright Win For Anthropic Blake Brittain / Reuters : Microsoft sued by authors over use of books in AI training Simon Willison / Simon Willison's Weblog : Anthropic wins a major fair use victory for AI — but it's still in trouble for stealing books. … David Braue / Information Age : Training genAI on copyright materials is fair use, judge rules Ben Thompson / Stratechery : Training AI is Fair Use, Product Protection Versus LLM Liability, Piracy and Competition US District Court : Plaintiffs v. ANTHROPIC Andrew Jeong / Washington Post : Federal court says copyrighted books are fair use for AI training Ashley Belanger / Ars Technica : Key fair use ruling clarifies when books can be used for AI training Ed Nawotka / Publishers Weekly : Federal Judge Rules AI Training Is Fair Use in Anthropic Copyright Case Jess Kinghorn / PC Gamer : US judge rules that Anthropic's use of copyrighted content to train AI was fair use, but pirating books is step too far Business Today : Judge rules Anthropic's AI training with books is fair use but storing pirated copies violates copyright Lee Chong Ming / Business Insider : Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said Iain Thomson / The Register : LLMs can hoover up data from books, judge rules Chris Cooke / CMU : First major ruling on AI and fair use goes against the copyright industries, though with a silver lining relating to pirated training content Stuart Dredge / Music Ally : AI firm Anthropic wins a ‘fair use’ victory in books case Glenn Chapman / Tech Xplore : US judge backs using copyrighted books to train AI Rishaj Upadhyay / Windows Report : Anthropic Wins Landmark Case as Judge Rules AI Book Training Is Fair Use Skye Jacobs / TechSpot : Court says AI training on books is fair use but Anthropic must face trial over pirated copies Matt O'Brien / Associated Press : Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books Shelly Palmer : Anthropic's AI Training Deemed Fair Use; Piracy Claims Proceed to Trial Meg Tanaka / Wall Street Journal : Anthropic Lands Partial Victory in AI Case Set to Shape Future Rulings Anna Washenko / Engadget : Judge rules Anthropic's AI training on copyrighted materials is fair use Vyom Ramani / Digit : Fair use vs copyright: Anthropic's case and its impact on AI training Bloomberg Law : Mixed Anthropic Ruling Builds Roadmap for Generative AI Fair Use Vismaya V / Decrypt : Anthropic Scores Partial Victory in Copyright Case Over AI Training Data Ben Golin / JURISTnews : US federal judge issues landmark ruling on AI copyright law Theodore McKenzie / 80 Level : US Court Declares Training AI Models on Books Without Author Permission is “Fair Use” Markus Kasanmascheff / WinBuzzer : Anthropic Wins Landmark AI Copyright Ruling, But Faces High-Stakes Piracy Trial Kamya Pandey / MEDIANAMA : US Court Finds ‘Fair Use’ in Anthropic Training AI Models with Purchased Books, Not the Pirated Ones Chris Stokel-Walker / Fast Company : Anthropic's AI copyright ‘win’ is more complicated than it looks Rocket Drew / The Information : Anthropic's Use of Books as Training Data Is Fair Use, Says Court Angela Yang / NBC News : Federal judge rules copyrighted books are fair use for AI training Jean Leon / Android Headlines : Anthropic's Court Win Could Change the AI Landscape Forever Kate Knibbs / Wired : Anthropic Scores a Landmark AI Copyright Win—but Will Face Trial Over Piracy Claims Jay Barmann / SFist : Federal Judge In SF Rules That AI Company Anthropic Did Not Violate Copyright Law In Training Its Chatbot Mike Wheatley / SiliconANGLE : Judge sides with Anthropic in landmark AI copyright case, but orders it to go on trial over piracy claims Al Jazeera : US judge allows company to train AI using copyrighted literary materials Rob Pegoraro / PCMag : Judge: It's Fair Use to Train AI on Books You Bought, But Not Ones You Pirated Vishnu Kaimal / The American Bazaar : Judge rules Anthropic's use of books for AI training doesn't violate copyright law Eileen McDermott / IPWatchdog.com : Judge Calls Anthropic's Training of LLMs with Authors' Works ‘Quintessentially Transformative’ But Gives No Pass on Piracy Jon Keegan / Sherwood News : Judge rules Anthropic training on books it purchased was “fair use,” but not for the ones it stole David Gerard / Pivot to AI : Anthropic AI wins broad fair use for training! — but not on pirated books Adam Levine / Barron's Online : A Federal Judge's Ruling in Anthropic Case Is a Major Win for AI Companies Tom Chivers / Semafor : Judge sides with Anthropic on using books to train models Kristian Stout / Truth on the Market : Bartz v. Anthropic: Mapping Fair-Use Boundaries in the Age of Generative AI Matthias Bastian / The Decoder : Anthropic won a fair use hearing that could end up being a defeat nextbigwhat : Anthropic AI Wins Copyright Battle, Faces Trial for Piracy Claims AppleInsider : Courts say AI training on copyrighted material is legal Ian Stark / UPI : Judge rules Anthropic's use of books to train AI model is fair use Jeremy Gray / PetaPixel : Federal Judge Gives AI Companies a Landmark ‘Fair Use’ Victory Sharon Goldman / Fortune : A federal judge says training AI on copyrighted works is ‘fair use,’ but casts doubt on use of pirated materials Annelise Levy / Bloomberg Law : AI Training Is Fair Use, Judge Rules in Anthropic Copyright Suit Bluesky: Dare Obasanjo / @carnage4life : A judge has ruled that Anthropic training it's AI on copyrighted books is fair use if they paid for the books. This is analogous to a human buying a book, reading it and learning from it. — This judgement doesn't cover training AI on pirated books. — Despite that this is a significant decision. Erin Fogg / @criminalerin : The headline part of the ruling sucks but this buried at the end part is good: ""That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages," the judge wrote." Nilay Patel / @reckless : Court rules Anthropic training an LLM on books is fair use... if they bought the books. But Anthropic also pirated a lot of books, and now faces a second, potentially major, damages trial for stealing them [image] Eriq Gardner / @eriqgardner : 🚨BREAKING: Federal judge concludes that using copyrighted works to train generative A.I. is transformative and ultimately a fair use. (Nevertheless, Anthropic can't beat the lawsuit because it pirated books for another purpose too.) First of kind ruling. www.documentcloud.org/documents/ 25... Jason Sanford / @jasonsanford : Despite the headline, mixed ruling by a Federal judge in the Anthropic AI lawsuit. Judge ruled using books to train the company's AI didn't violate copyright laws bc it fell under fair use by being “exceedingly transformative.” I disagree with this interpretation. 1/ — www.reuters.com/legal/litiga... Joseph Cox / @josephcox : New from 404 Media: A judge ruled Anthropic likely violated copyright law when it pirated authors' books to create a giant dataset and “forever” library, but training its AI on those books without authors' permission constitutes fair use under copyright law www.404media.co/judge-rules- ... Jason Koebler / @jasonkoebler : A judge ruled Anthropic training AI on books without permission was legal, but pirating the books was not. This is ultimately a very damaging decision but there are many other cases working their way through the legal system that have worse facts — www.404media.co/judge-rules- ... Alejandra Caraballo / @esqueer.net : Most of the AI companies used far more than lawfully acquired copies of books so this could present a major issue going forward. We'll need to see how lawsuits involving scraped material off the internet play out to see how AI companies will fair under copyright law. Alejandra Caraballo / @esqueer.net : This presents a massive liability for AI companies still who often scraped millions of documents, videos, images etc. without authorization. So the precedent this sets here in this lawsuit is that when lawfully acquired, training AI models on copyrighted material is fair use. Alejandra Caraballo / @esqueer.net : Major decision out of the Northern District of California. AI training on copyrighted material is fair use under the copyright act. This will have major implications for the AI industry and artists having their work stolen by AI companies. — storage.courtlistener.com/recap/ gov.us... [image] Nilay Patel / @reckless : Alsup is a very smart and very sharp judge, and his order is eminently readable. You should read it! www.documentcloud.org/documents/ 25... Nilay Patel / @reckless : If you are running away saying this case definitively rules the training an LLM is fair use... you're going to make some big and potentially very expensive mistakes. Ruling very specifically does not reach *outputs* which feels important to future cases - and very clearly comes down hard on piracy James Grimmelmann / @jtlg : This decision will almost certainly be appealed, of course, but it may be a good bellwether for where these lawsuits are going in general. If this pattern holds, then AI training will typically be fair use, but companies will need to turn square corners in acquiring their training data. /end James Grimmelmann / @jtlg : The big unanswered question (because it wasn't presented here) is whether web scraping is more like scanning books (fair use) or like downloading “pirated” books (not fair use). — (I put “pirated” in quotes because the distinction could come under pressure in future cases about training datasets.) James Grimmelmann / @jtlg : Judge Alsup has the first true opinion on fair use for generative AI in Bartz v. Anthropic. He holds that AI training is fair use, and so is buying books to scan them, but that downloading pirated copies of books for an internal training-data database is not fair use. 🧵 [embedded post] Mastodon: @mttaggart@infosec.exchange : I can't get over the nature of the book collection and ingestion that was deemed “fair use” in the Anthropic case. — So they bought up books by the million, ripped them up, scanned them, and discarded the originals. And the resultant digital copies were fair use. … Felix Stalder / @festal@tldr.nettime.org : I'm still remembering the Draconian rulings the courts handed down against non-commercial “file sharers” in the 00s. — So I cannot but marvel at how flexible they are towards the interests of capital. — Altman's “right to learn” argument was quickly picked up, setting a potentially major precedent. … Mignon Fogarty / @grammargirl@zirk.us : A federal judge has ruled that Anthropic's use of copyrighted works to train AI is transformative and fair use. — An important caveat is that in this particular case, the authors conceded that “training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public.” … Threads: Bilawal Sidhu / @bilawal.ai : Anthropic bought millions of books to scan for Claude. Makes you wonder — have AI companies been quietly purchasing Blu-ray Discs by the truckload to rip visual datasets too? Maybe it's easier to exploit a legal gray area with physical media than scrape YouTube against its ToS. Marc Love / @marcslove : Of course, this is still complicated for Anthropic and other LLM companies. The judge strongly implies that the act of keeping a pirated copy or copying a digital book from the library and storing it anywhere constitutes copyright infringement. “Statutory damages are usually between $750 and $30,000 per work, as determined by the court. … X: Sauers / @sauers_ : Anthropic purchased millions of physical print books to digitally scan them for Claude [image] Antonio García Martínez / @antoniogm : The sound of a dozen startups pivoting. Kevin Roose / @kevinroose : How do you even buy millions of used print books? Can you call up the Strand and go “yes hello I'd like two million books” David Sacks / @davidsacks : Positive ruling for AI. There must be a fair use concept for training data or models would be crippled. China is going to train on all the data regardless, so without fair use, the U.S. would lose the AI race. Andrew Curran / @andrewcurran_ : From the ruling: 'Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them - but to turn a hard corner and create something different.' [image] Adam Eisgrau / @adameisgrau : BIG AND BREAKING: @Anthropic wins its Motion for Summary Judgment on fair use grounds, Judge Alsup rules, but a trial will follow on potential damages for the use of “pirated” material from the internet Details to follow, but here's the literal bottom line: [image] Ed Newton-Rex / @ednewtonrex : ...2. It will likely be appealed, and will probably go to higher courts. I think the judge mischaracterizes the effect of the copying on the market for & value of the original, and I suspect this will be the subject of more debate. This decision is unlikely to be the end of the story. So there are good aspects - in particular, it looks like many AI companies will be determined to be infringing copyright on a massive scale due to their use of masses of pirated works. Andrew Curran / @andrewcurran_ : A federal judge has ruled that Anthropic's use of books to train Claude falls under fair use, and is legal under U.S. copyright law. [image] Matthew Ball / @ballmatthew : U.S. District Court finds Anthropic's use of copyright books (i.e. w/o express consent/agreement/deal) to train LLMs is “transformative” ("among the most transformative in our lifetime") thus justified as fair use + “in light of the purposes of copyright” https://www.documentcloud.org/ ... Eriq Gardner / @eriqgardner : 🚨BREAKING: Federal judge concludes that using copyrighted works to train generative A.I. is transformative and ultimately a fair use. (Nevertheless, Anthropic can't beat the lawsuit because it pirated books for another purpose too.) First of kind ruling. https://www.documentcloud.org/ ... Neil Turkewitz / @neilturkewitz : Actually, that's not apparently what the court ruled. I haven't read the decision, but according to the article: “Alsup also said, however, that Anthropic's storage of the authors' books in a ‘central library’ violated their copyrights & was not fair use.” So not so fast! @kimmonismus : This could be a landmark ruling: A court rules that anthropic models trained with licensed books fall under “fair use” and may be used. This is a major and significant ruling for the training of AI models - and at the same time, to be honest, a challenge for creative authors. Mat Dryhurst / @matdryhurst : Big, and goes even further than I expected Judge determines the use of books for model training is transformative and constitutes fair use so long as outputs are not infringing Case is going to trial over the use of pirated books, which is obviously illegal and expected @xlr8harder : Sounds like a huge win for fair use. Just should have bought the books first instead of stealing them. Steven Sinofsky / @stevesi : Full ruling here. Expect future cases to spend more energy on substitute for the original. Not sure this is a final word or they made the best case. https://storage.courtlistener.com/ ... Morgan / @morqon : anthropic wins a major judgement on fair use, can train models on purchased books, but goes to trial for storing pirated works - training is “spectacularly transformative” - memorisation of style and content for statistical modelling is analogous to a person learning from Roope Rainisto / @rainisto : Really fascinating fair use analysis and summary judgment in favour of Anthropic (for nerds like me interested in the legal argument) - tens of pages of arguments about each of the four factors of fair use. [image] LinkedIn: Emil Protalinski : Training an LLM on copyrighted books is fair use. — That's if you bought the books, according to a US judge who ruled in a case between Anthropic and authors of copyrighted books. … Chris Heatherly : This is a terrible ruling and must be overturned. No one can honestly argue that existing law was written contemplating AI. … Forums: Hacker News : A federal judge sides with Anthropic in lawsuit over training AI on books Hacker News : Anthropic bags fair use win but faces trial for using pirated works r/vfx : US District Court rules AI training is Fair Use r/Filmmakers : Anthropic wins key US ruling on AI training in authors' copyright lawsuit r/samharris : Northern District of California Judge Rules Training AI on Existing Copyrighted Works is Fair Use r/antiai : Anthropic wins key US ruling on AI training in authors' copyright lawsuit r/aiwars : Anthropic wins key court case - training on books is fair use. r/technology : Judge rules Anthropic did not violate authors' copyrights with AI book training r/Piracy : Anthropic wins key US ruling on AI training in authors copyright lawsuit r/books : Anthropic wins key US ruling on AI training in authors' copyright lawsuit r/ClaudeAI : A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law r/aiwars : Anthropic wins key ruling on AI in authors' copyright lawsuit r/singularity : A federal judge has ruled that Anthropic's use of books to train Claude falls under fair use, and is legal under U.S. copyright law r/technology : Anthropic wins key ruling on AI in authors' copyright lawsuit r/OpenAI : Anthropic wins key ruling on AI in authors' copyright lawsuit r/ArtistHate : Northern District of California Court judges Anthropic's training on millions of books Fair Use, the act of piracy itself is a subject for another trial. Msmash / Slashdot : Anthropic Bags Key ‘Fair Use’ Win For AI Platforms, But Faces Trial Over Damages For Millions of Pirated Works See also Mediagazer