The NYT sues OpenAI and Microsoft for copyright infringement, alleging they used millions of its articles to train AI, the first major US media outlet to sue

community responds Gaurav Girotra / Tech in Asia : NYT files copyright suit against OpenAI, Microsoft Jacob Oliver / CryptoSlate : New York Times sues Microsoft, OpenAI for alleged copyright infringement Luke Jones / WinBuzzer : The New York Times Sues Microsoft and OpenAI Over Alleged Copyright Infringement Winston Cho / The Hollywood Reporter : The New York Times Brings Receipts In Lawsuit Against OpenAI Sadagopan / Sadagopan's weblog … : New York Times Sues OpenAI Derek Strickland / TweakTown : New York Times sues OpenAI and Microsoft for using millions of copyrighted articles to train AI MarketWatch : New York Times's stock rises as company files copyright-infringement suit against Microsoft and OpenAI Eric Hal Schwartz / Voicebot.ai : The New York Times Sues OpenAI and Microsoft for Training Generative AI Models With Copyrighted Work Trevor Mogg / Digital Trends : OpenAI and Microsoft sued by NY Times for copyright infringement Gerrit De Vynck / Washington Post : New York Times sues OpenAI, Microsoft for using articles to train AI Cory Weinberg / The Information : New York Times Co.'s OpenAI-Microsoft Suit Is a Negotiating Tactic Terry Sullivan / iMore : Disagreements in the age of AI: The New York Times sues OpenAI and Microsoft for copyright infringement PYMNTS.com : Nothing Transformative About OpenAI's Copyright Abuses, Says New York Times Lawsuit Jason Calacanis on Startups : OpenAI's Napster/Google Moment Jennifer Maas / Variety : The New York Times Sues OpenAI, Microsoft for Copyright Infringement Claiming ‘Billions of Dollars’ in Damages Ryan Whitwam / ExtremeTech : NY Times Sues OpenAI and Microsoft Over ChatGPT Copyright Infringement Alex Pigman / Tech Xplore : New York Times sues OpenAI, Microsoft in copyright clash James Powel / USA Today : The New York Times is suing OpenAI over copyright breaches, here's what you need to know Ryan McNeal / Android Authority : New York Times sues Microsoft and OpenAI for training AI on its articles Rob Beschizza / Boing Boing : New York Times sues OpenAI and Microsoft, claiming copyright infringement Jose Antonio Lanz / Decrypt : OpenAI Trained AI Models on Copyrighted Work, Says NYT Lawsuit Colton Stradling / Windows Central : New York Times sues Microsoft and OpenAI for impacting its business, claims generative AI models don't qualify for fair use Ryan Browne / NBC News : New York Times sues Microsoft, ChatGPT maker OpenAI over copyright infringement Associated Press : The New York Times is suing OpenAI — alleging it trained ChatGPT off its content RTÉ : New York Times sues OpenAI, Microsoft over copyright infringement Kurt Schlosser / GeekWire : The New York Times sues OpenAI and Microsoft for training AI chatbots on its copyrighted work Kyt Dotson / SiliconANGLE : New York Times sues Microsoft, OpenAI over AI training copyright infringement Malak Saleh / Engadget : The New York Times is suing OpenAI and Microsoft for copyright infringement Kyle Wiggers / TechCrunch : The New York Times wants OpenAI and Microsoft to pay for training data Alex Perry / Mashable : The New York Times sues OpenAI and Microsoft for copyright infringement Mother Jones : The New York Times is Suing OpenAI and Microsoft for Copyright Infringement Usman Qureshi / iPhone in Canada Blog : New York Times Sues OpenAI, Microsoft Over AI Copyright Infringement New York Daily News : New Jersey police tell public to ignore AI-generated story about Christmas shooting that never happened Adam Mawardi / Telegraph : New York Times sues OpenAI and Microsoft over copyright concerns Mark Bergen / Fortune : New York Times sues OpenAI and Microsoft for training their AI using its articles and causing publishers billions of dollars in damages Alisa Davidson / Metaverse Post : The New York Times Files Lawsuit Against OpenAI and Microsoft Alleging Copyright Infringement Matthias Bastian / The Decoder : New York Times files lawsuit against OpenAI and Microsoft demanding destruction of ChatGPT Emily Price / PCMag : The New York Times Sues OpenAI and Microsoft For Copyright Infringement Kate Seamons / Newser : NYT Suit : We Suffered Billions in Damages Due to AI John Callaham / Neowin : Microsoft and OpenAI are being sued by the New York Times for copyright infringement Los Angeles Times : New York Times sues OpenAI, Microsoft over use of its stories to train chatbots Agence France-Presse : New York Times sues OpenAI, Microsoft for copyright infringement Arthur Brown / Android Headlines : The New York Times is suing OpenAI and Microsoft for ‘Billions’ of dollars William Hicks / Seattle Business Journal : The New York Times sues OpenAI and Microsoft over copyright infringement Joe Wituschek / BGR : The New York Times is suing OpenAI and Microsoft over ChatGPT training data Omar Moharram / Supercharged : OpenAI, Microsoft sued over unauthorized copyrighted content use to train AI models amid Apple licensing deal rumors Samson Akintaro / Nairametrics : ChatGPT: New York Times sues OpenAI, Microsoft for copyright infringement Dave Winer / Scripting News : The NYT is suing ChatGPT for ingesting their archive. I understand why they would sue now … Financial Times : New York Times sues Microsoft and OpenAI in copyright case Paul Thurrott / Thurrott : The New York Times Sues Microsoft, OpenAI for Copyright Infringement WRAL TechWire : NY Times sues OpenAI and Microsoft alleging copyright infringement Kelvin Munene Murithi / CoinGape : OpenAI and Microsoft Face New York Times Lawsuit Over Use of AI-Generated Content Jill Goldsmith / Deadline : The New York Times Sues Open AI, Backer Microsoft For Copyright Infringement; Says Platform Wants To “Free-Ride” Expensive Journalism For “Substitute Products” J.D. Capelouto / Semafor : New York Times sues OpenAI and Microsoft Erik Hayden / Hollywood Reporter : The New York Times Sues OpenAI and Microsoft After Impasse Over Deal to License Content Threads: Jonathan Hoefler / @jonathanhoefler : 26 questions for the “AI isn't ART, because ART takes WORK!!!” crowd: 1. How much work should art take? 2. If an artist finishes early, is their art disqualified, or is it just some percentage less artistic? 3. Do these values change throughout the life of an artist? … Benedict Evans / @benedictevans : Looking at an estimate that the NY Times is ~0.0083% of common crawl, and reminded again of the disconnect between the social and political value of professional journalism and the amount it actually makes up of internet consumption/traffic/revenue... and journalists' recurrent refusal to accept that ‘news is important to society’ is not the same as 'Google's ad business needs news' Dare Obasanjo / @carnage4life : The big problem with all the copyright infringement lawsuits against OpenAI related to training ChatGPT is that it will have broader ramifications on any services that crawl the web for information like Alexa or Google Search. … Ted Underwood / @tedunderwoodillinois : Re the NYT suit: I still think it's important to hold the line on the premise that training, indexing, etc don't in themselves infringe copyright. But it also doesn't seem crazy to suggest that systematic paraphrases of recent, newsworthy content are in effect a competing product. … Jo Ling Kent / @jolingkent : In response to the @nytimes suing OpenAI and @Microsoft for copyright infringement and a “free-ride on The Times's massive investment in its journalism” — an OpenAI spokesperson tells me the company is “surprised and disappointed with this development.” … Mathew Ingram / @mathewi : I get that newspapers are all mad at AI, but summarizing Times articles and even “mimicking its expressive style” shouldn't constitute copyright infringement under any reasonable definition Nilay Patel / @reckless1280 : All the AI copyright cases are bombshells in different ways but the NYT complaint is particularly strong because it layers additional claims like trademark dilution since ChatGPT will hallucinate NYT articles that don't exist https://www.theverge.com/... Chris Messina / @chris : Generative AI may kill the open web. https://www.nytimes.com/... Benedict Evans / @benedictevans : Is it the fundamental purpose of ChatGPT, or any LLM, to be able to answer questions about specific facts that may or may not be in content produced by newspapers? Or is it for them to be some kind of reasoning engine, created by training a model on some amount of text written by humans, where it doesn't really matter what the text is about. … Jeff Jarvis / @jeffjarvis : On the one hand: Unlike every book in Books3, I'll bet someone at OpenAI has a NYT subscription. On the other hand: I wonder whether OpenAI licensing AP & Springer sets a difficult precedent for this case. Kali Hays / @kalihays1 : A pretty incredible lawsuit with big implications. The issues are: Money (ofc). Sounds like Msft/OpenAI refused to pay for the value of NYTs decades of journalism, after already ingesting all of it for free. News as a business. … Nilay Patel / @reckless1280 : The legal system is not deterministic: it is made up of unpredictable nerds with weird ideas, not computers and algorithms. You cannot predict the outputs of a court case based on the inputs, especially in fair use cases which are evaluated on a case-by-case basis and are historically coin flips! … Kara Swisher / @karaswisher : It begins. Benedict Evans / @benedictevans : I wrote about newspapers and OpenAI in August. This was all very predictable. The defence from OpenAI that the model does not contain the training data is true, but incomplete. Who owns ‘this’? Mastodon: Carolyn Barber / @cbarbermd@med-mastodon.com : Fascinating. The NYT sued Open AI and Microsoft today for copyright infringement, contending that millions of NYT articles were used to train the chatbots. The suit says the defendants should be held responsible for “billions of dollars in statutory and actual damages related to the “unlawful copying and use of The Times's uniquely valuable works. … @Impossible_PhD@hachyderm.io : I haven't been very worried about AI, even though I'm a writer. — Why? — Because it takes a while for the law teams employed by the titans of old media to rumble to action, but it was always clear they were coming. These are the teams that don't sue other companies unless they're certain of winning. … Dare Obasanjo / @carnage4life@mas.to : The New York Times has sued OpenAI arguing that millions of its articles were used to train their models which now compete with the publisher. — There are now so many of these lawsuits I'd assumed the NYT already was suing. — It's quite clear we will need some sort of Supreme Court ruling on whether training models on copyrighted works is infringement or not. … Simon Willison / @simon@fedi.simonwillison.net : Does this new NY Times lawsuit about OpenAI training on their data mean that the details of that training data might come out in discovery? https://apnews.com/... @jon@henshaw.social : It's all a matter of when, not a matter of if. The fuel to enable the ELIZA effect comes from stealing everyone else's human generated content. And the idea of artificially generating content from stolen human generated content and selling it for profit was always going to catch up to them legally. … Waldo Jaquith / @waldoj@mastodon.social : I'm increasingly suspicious that Apple is up to something big in the AI space. I think they're creating the first LLM that *isn't* based on unlicensed, copyrighted text. https://www.nytimes.com/... Jeff Jarvis / @jeffjarvis@mastodon.social : When journalism thinks its value is intrinsic in the commodity, content, rather than in service, education, and collaboration: — The New York Times sued Microsoft and OpenAI for alleged copyright infringement — https://www.wsj.com/... Gary McGraw / @cigitalgem@sigmoid.social : It's not just authors anymore. The NY Times sues OpenAI and Microsoft over ML copyright issues. — #ML systems leak training data consistently. — #MLsec — https://www.wsj.com/... Bluesky: Emil Protalinski / @epro.social : This is more than just a simple copyright case. Sure, there is money that can be potentially won, but this is a strategic lawsuit more than a cash grab. The New York Times is likely trying to gain leverage and create precedent for itself and media organizations everywhere. [embedded post] Jackson Palmer / @ummjackson.com : I appreciate optimism sometimes, but does anyone actually believe we'll ever see effective regulation of this stuff? IIRC, all other big cases like this have been dismissed thus far. [embedded post] X: François Chollet / @fchollet : “LLMs cannot store any of their training data because the checkpoints are way too small” is such a bizarre take, akin to saying “there's no way a 20GB file could contain all the text in Wikipedia, because that's 25B characters, which should take over 200GB at 8 bits / character” François Chollet / @fchollet : LLMs are big curves fitted to a token distribution. That is to say, they're a text dataset encoded via (very) lossy compression. They can absolutely “store” data to the extent that the data can be recovered, much like the JPEG format can store images, even if it damages them. Willie Agnew / @willie_agnew : Omg they cited our work in the lawsuit 🥹🥹🥹🥹 Artists, writers, and other workers if there are other audits or research on AI datasets that would help give y'all back control over your work and data, or help with ongoing lawsuits against theft by AI companies, please reach out Jason Kint / @jason_kint : So back to Exhibit J. Unlike the other 220k+ pages of exhibits documenting registered works, this exhibit contains 100 examples of alleged copyright violations with nearly identical content being outputted by ChatGPT. Again, it's impossible to argue with this. /13 [image] Cecilia Ziniti / @ceciliazin : 🧵 The historic NYT v. @OpenAI lawsuit filed this morning, as broken down by me, an IP and AI lawyer, general counsel, and longtime tech person and enthusiast. Tl;dr - It's the best case yet alleging that generative AI is copyright infringement. Thread. 👇 [image] Jonathan Stray / @jonathanstray : NYT lawsuit against OpenAI seems strong — lots of verbatim text reproduction. But it raises an even more complex question: what if an LLM was sure to paraphrase all of its training data? This is closer to what image generation does. Copyright applies to expressions, not ideas. @jason : OpenAI's Napster moment: The NYT is going to win a huge settlement here, and/or they could get an injunction forcing OpenAI to redo their models without the allegedly stolen data. ... without the copyrighted content from Reddit, Quora, NYTimes, twitter, and countless other language oil fields, how effective can these models be? And it's the NYTimes' opportunity to make a language model based on their IP — not anyone else's. You must get permission to create a new, commercial product based on someone else's IP — just like Spotify did, and Napster didn't. Dan Primack / @danprimack : 3/ My guess is most media cos will favor short-termism, as they did w/ classifieds and social media in decades past — unable to see beyond the next quarter — but they have a chance to hold the cards for once... Gary Marcus / @garymarcus : OpenAI is in serious trouble. 👉The excerpt below is particularly damning, because the prompts that elicited the plagiarism in no way requested that the system draw on the NYT at all. 👉@jason_kint & @CeciliaZin largely converge on the overall seriousness of the suit. 👉OpenAI... Mathew Ingram / @mathewi : I get that newspapers are all mad at AI, but summarizing Times articles and even “mimicking its expressive style” shouldn't constitute copyright infringement under any reasonable definition https://www.theverge.com/... Dan Primack / @danprimack : 2/ Counterargument is that gen AI cos will just need to pay pennies, so it's not really too big a deal given their billions in outside investment. BUT: If $$ is meager, media cos may not play ball. Particularly given that gen AI threatens to cannibalize their traffic. Jason Kint / @jason_kint : The complaint also steps through the preference and weighting used for sources with claims NYT-sourced content is more valued for training. And that undermining that real investment will undermine the entire market for journalism - including licensing it for future AI. /10 [image] Bill Grueskin / @bgrueskin : NYT's suit showed that “a Microsoft search feature powered by ChatGPT reproduced almost verbatim results from Wirecutter. “The results did not link to the Wirecutter article, and they stripped away referral links used to generate commissions from sales” https://www.nytimes.com/... Kevin A. Bryan / @afinetheorem : NYT/OpenAI lawsuit completely misunderstands how LLMs work, and judges getting this wrong will do huge damage to AI. Basic point: LLMs DON'T “STORE” UNDERLYING TRAINING TEXT. It is impossible- the parameter size of GPT-3.5 or 4 is not enough to losslessly encode the training set. Jason Kint / @jason_kint : ok, I've now read the full NYT complaint filed this morning vs OpenAI and Microsoft. I'm impressed - it's future-focused around fair value for work vital to democracy. It also contains 220k pages of exhibits although the pages of Ex J stood out to me. more on that in a minute. /1 [image] Gary Marcus / @garymarcus : @sbergman @jason_kint These systems are always stochastic; few prompts ever give same results over and over Ben Ansell / @benwansell : Some of the direct comparisons with NYT articles and Chat GPT output are insane. And to think we have been obsessing about plagiarism in the acknowledgements of a 90s PhD while this is the current state of affairs with LLMs. Jason Kint / @jason_kint : Here are four examples. Again, the lawsuit includes one hundred of them. You get the point. I find this exhibit to be an incredibly powerful illustration for a lawsuit that will go before a jury of Americans. Again, it's impossible to argue with this. /14 [image] Antonio García Martínez / @antoniogm : The next big field of media attribution to develop will be crediting underlying content for powering AI queries, and measuring how much revenue should be shared with this or that publisher (in effect, a training data publisher, not a content one). @ivanthek : I cancelled my New York Times subscription since I already get ChatGPT. @jeffjarvis : Here is the lede of the NYTimes' suit against OpenAI, about the sacredness of journalism, followed by excerpts from The Gutenberg Parenthesis about the sacred rhetoric used by newspapers to oppose radio's entrance into news: [image] Amol Sharma / @asharma : “Having failed to secure what they saw as their fair share of the explosive internet growth powered by search and social media, publishers don't want to meet the same fate with AI.” That's the dynamic causing the Times and others to take a hard line. Cecilia Ziniti / @ceciliazin : 7/ 💼 Another interesting point: NYT got really good lawyers. Susman Godfrey has a great reputation and track record taking on tech. This isn't a quick cash grab like the lawsuits filed a week after ChatGPT; it's a strategic legal challenge. Antonio García Martínez / @antoniogm : The next big field of media attribution to develop will be crediting underlying content for powering AI queries, and measuring how much revenue should be shared with this or that publisher (in effect, a training data publisher, not an content one). Amjad Masad / @amasad : I feel betrayed that ChatGPT, who I consider a close friend, turned out to be a basic bitch mainstream reporter 😭 [image] Cecilia Ziniti / @ceciliazin : 6/ 🚫 Misinformation allegations add a clever twist. The complaint pulls in something people are scared of - hallucinations - and makes a case out of it, citing examples where elements of NYT articles were made up. 🍊 Most memorable example? Alleging Bing says the NYT published... Timothy B. Lee / @binarybits : It doesn't seem out of the question that AI companies could lose these cases catastrophically and be forced to pay billions to plaintiffs and rebuild their models from scratch. Conor Sen / @conorsen : So how exactly are LLM's going to work when all the content providers block access to their content without permission? Or will this be a lucrative new source of revenue for the NYT/WSJ et al? [image] Aaron Levie / @levie : This is a good thing. We will be able to finally put to rest the ambiguity on copyrights and training data. [image] Bobby Allyn / @bobbyallyn : From my story in August about the NYT gearing up to sue OpenAI: “if a federal judge finds that OpenAI illegally copied the Times' articles to train its AI model, the court could order the company to destroy ChatGPT's dataset” https://www.npr.org/... Ethan Mollick / @emollick : In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles. [image] Miles Kruppa / @mileskruppa : Here's Google's AI summary of the Times vs Microsoft/OpenAI lawsuit. Links to CNN, The Verge, and others, but a snub to the Times' own article on the subject. [image] Cristina Caffarra / @caffar3cristina : News from respectable outlets is super valuable training data bcs it is verified and accurate (relatively). In the ocean of repetitive junk data that models are trained over, decades of news archives are very useful. This is about division of rents, expect a lot more of it. Chamath Palihapitiya / @chamath : The interesting thing about this NYT/OpenAI lawsuit is the counterfactual. If Apple is, indeed, writing substantial checks to media companies to license their content for training models, the impact of this and other lawsuits against AI companies training on non-public data will be swift and meaningful. A very clever move by Apple if this lawsuit goes the way of NYT. Ryan Browne / @ryan_browne_ : The NYT is suing Microsoft and OpenAI for “billions of dollars” worth of damages over alleged copyright infringement. The Times alleges the firms created a business based on “mass copyright infringement” through their use of news content to train AI. https://www.cnbc.com/... J.D. Capelouto / @jdcapelouto : Interesting — the NYT lawsuit leans on the fact that Times stories were heavily posted to Reddit, which was then apparently used to train OpenAI's models: [image] @quinnnorton : This will cost of lot of money and be stupid. Dave Winer / @davewiner : For what it's worth, my two cents on the NYT suit re ChatGPT. https://scripting.com/... [image] Dan Primack / @danprimack : 1/ The gen AI model is f'd if companies like OpenAI need to pay copyright holders for their content. That would cover every news media, social media, music, book, etc publisher. Maybe a big advantage for Meta/X, who own their own content. Sara Fischer / @sarafischer : .@nytimes sues @OpenAI and @Microsoft for copyright infringement — This is a big deal because it could set a precedent for 1. How courts define the value of news content in training large language models and 2. What the damages are for previous use @axios https://www.axios.com/... Rob Leathern / @robleathern : In other news, NYT plans to sue the Internet Archive and Wikipedia next for allowing people to ‘rely on past journalism by The Times’... 🤔 “When chatbots are asked about current events or other newsworthy topics, they can generate answers that rely on past journalism by The... Michael M. Grynbaum / @grynbaum : Read The Times's full complaint against OpenAI and Microsoft here: https://www.nytimes.com/... https://nytco-assets.nytimes.com/ ... Ben Schoon / @nexusben : Curious to see where this lands. Strongly feel that these AI products will be unsustainable one way or another. Paying for licensing will be too expensive, but these will also lessen traffic, lessening content available by way of ad rev, which means less for AI to feed off Claire Atkinson / @claireatki : I'm all for progress but big tech is simply stealing and repackaging content, often inaccurately. Journalism is expensive. ChatGPT should pay for it. https://www.wsj.com/... Brad Sams / @bdsams : This is going to be significant, either OpenAI/Microsoft finds a way to not pay or they are forced to license the content at which point...everyone whose text they used could be entitled to payments which would be unsustainable. Julia Alexander / @loudmouthjulia : This felt wholly inevitable, and like the first domino to fall. Let's see if any form of precedent will be set. Adam Singer / @adamsinger : @rustybrick “Well we could have had AGI utopia, but the lawyers got involved” Barry Schwartz / @rustybrick : More AI lawsuits @ivanthek : Tip of the AIceberg. The Times Sues OpenAI and Microsoft Over A.I.'s Use of Copyrighted Work https://www.nytimes.com/... Brian Stelter / @brianstelter : “The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works...” LinkedIn: Charles L Mauro : New York Times sues Microsoft ChatGPT for copyright infringement. This case leads to my only prediction for 2024, which is one of these legal matters … Aaron Wilkerson : Follow the money. — The lawsuits are coming for OpenAI and Microsoft. The creators of the content that LLMs are using want to get paid. … Rob Saker : I'll be publishing my annual predictions for data + AI in #retail in the next few weeks. — One area I predicted - but admittedly nowhere near … Milton Pedraza : As we we enter 2024, the question is: when, not if, Original Content Creators will take control of their digital identity, data and IP, digital relationships and private/personal AI. … Brendan Witcher : It's worth considering what a judgememt against these tech companies could mean for others who are using the output of these GenAI solutions “as is” to create content. … Forums: Hacker News : NY Times copyright suit wants OpenAI to delete all GPT instances Msmash / Slashdot : New York Times Copyright Suit Wants OpenAI To Delete All GPT Instances Ars OpenForum : NY Times sues Open AI, Microsoft over copyright infringement See also Mediagazer

New York Times 2023-12-28

Discussion

@bdsams Brad Sams on x
This is going to be significant, either OpenAI/Microsoft finds a way to not pay or they are forced to license the content at which point...everyone whose text they used could be entitled to payments which would be unsustainable.
@loudmouthjulia Julia Alexander on x
This felt wholly inevitable, and like the first domino to fall. Let's see if any form of precedent will be set.
@adamsinger Adam Singer on x
@rustybrick “Well we could have had AGI utopia, but the lawyers got involved”
@rustybrick Barry Schwartz on x
More AI lawsuits
@ivanthek @ivanthek on x
Tip of the AIceberg. The Times Sues OpenAI and Microsoft Over A.I.'s Use of Copyrighted Work https://www.nytimes.com/...
@brianstelter Brian Stelter on x
“The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works...”
@tedunderwoodillinois Ted Underwood on threads
Re the NYT suit: I still think it's important to hold the line on the premise that training, indexing, etc don't in themselves infringe copyright. But it also doesn't seem crazy to suggest that systematic paraphrases of recent, newsworthy content are in effect a competing produc…
@jolingkent Jo Ling Kent on threads
In response to the @nytimes suing OpenAI and @Microsoft for copyright infringement and a “free-ride on The Times's massive investment in its journalism” — an OpenAI spokesperson tells me the company is “surprised and disappointed with this development.” …
@mathewi Mathew Ingram on threads
I get that newspapers are all mad at AI, but summarizing Times articles and even “mimicking its expressive style” shouldn't constitute copyright infringement under any reasonable definition
@reckless1280 Nilay Patel on threads
All the AI copyright cases are bombshells in different ways but the NYT complaint is particularly strong because it layers additional claims like trademark dilution since ChatGPT will hallucinate NYT articles that don't exist https://www.theverge.com/...
@chris Chris Messina on threads
Generative AI may kill the open web. https://www.nytimes.com/...
@benedictevans Benedict Evans on threads
Is it the fundamental purpose of ChatGPT, or any LLM, to be able to answer questions about specific facts that may or may not be in content produced by newspapers? Or is it for them to be some kind of reasoning engine, created by training a model on some amount of text written b…
@jeffjarvis Jeff Jarvis on threads
On the one hand: Unlike every book in Books3, I'll bet someone at OpenAI has a NYT subscription. On the other hand: I wonder whether OpenAI licensing AP & Springer sets a difficult precedent for this case.
@kalihays1 Kali Hays on threads
A pretty incredible lawsuit with big implications. The issues are: Money (ofc). Sounds like Msft/OpenAI refused to pay for the value of NYTs decades of journalism, after already ingesting all of it for free. News as a business. …
@reckless1280 Nilay Patel on threads
The legal system is not deterministic: it is made up of unpredictable nerds with weird ideas, not computers and algorithms. You cannot predict the outputs of a court case based on the inputs, especially in fair use cases which are evaluated on a case-by-case basis and are histor…
@karaswisher Kara Swisher on threads
It begins.
@benedictevans Benedict Evans on threads
I wrote about newspapers and OpenAI in August. This was all very predictable. The defence from OpenAI that the model does not contain the training data is true, but incomplete. Who owns ‘this’?
@cbarbermd@med-mastodon.com Carolyn Barber on mastodon
Fascinating. The NYT sued Open AI and Microsoft today for copyright infringement, contending that millions of NYT articles were used to train the chatbots. The suit says the defendants should be held responsible for “billions of dollars in statutory and actual damages related t…
@waldoj@mastodon.social Waldo Jaquith on mastodon
I'm increasingly suspicious that Apple is up to something big in the AI space. I think they're creating the first LLM that *isn't* based on unlicensed, copyrighted text. https://www.nytimes.com/...
@jeffjarvis@mastodon.social Jeff Jarvis on mastodon
When journalism thinks its value is intrinsic in the commodity, content, rather than in service, education, and collaboration: — The New York Times sued Microsoft and OpenAI for alleged copyright infringement — https://www.wsj.com/...
@epro.social Emil Protalinski on bluesky
This is more than just a simple copyright case. Sure, there is money that can be potentially won, but this is a strategic lawsuit more than a cash grab. The New York Times is likely trying to gain leverage and create precedent for itself and media organizations everywhere. [em…
@ummjackson.com Jackson Palmer on bluesky
I appreciate optimism sometimes, but does anyone actually believe we'll ever see effective regulation of this stuff? IIRC, all other big cases like this have been dismissed thus far. [embedded post]
@jason_kint Jason Kint on x
So back to Exhibit J. Unlike the other 220k+ pages of exhibits documenting registered works, this exhibit contains 100 examples of alleged copyright violations with nearly identical content being outputted by ChatGPT. Again, it's impossible to argue with this. /13 [image]
@ceciliazin Cecilia Ziniti on x
🧵 The historic NYT v. @OpenAI lawsuit filed this morning, as broken down by me, an IP and AI lawyer, general counsel, and longtime tech person and enthusiast. Tl;dr - It's the best case yet alleging that generative AI is copyright infringement. Thread. 👇 [image]
@jonathanstray Jonathan Stray on x
NYT lawsuit against OpenAI seems strong — lots of verbatim text reproduction. But it raises an even more complex question: what if an LLM was sure to paraphrase all of its training data? This is closer to what image generation does. Copyright applies to expressions, not ideas.
@jason @jason on x
OpenAI's Napster moment: The NYT is going to win a huge settlement here, and/or they could get an injunction forcing OpenAI to redo their models without the allegedly stolen data. ... without the copyrighted content from Reddit, Quora, NYTimes, twitter, and countless other langu…
@danprimack Dan Primack on x
3/ My guess is most media cos will favor short-termism, as they did w/ classifieds and social media in decades past — unable to see beyond the next quarter — but they have a chance to hold the cards for once...
@asharma Amol Sharma on x
“Having failed to secure what they saw as their fair share of the explosive internet growth powered by search and social media, publishers don't want to meet the same fate with AI.” That's the dynamic causing the Times and others to take a hard line.
@garymarcus Gary Marcus on x
OpenAI is in serious trouble. 👉The excerpt below is particularly damning, because the prompts that elicited the plagiarism in no way requested that the system draw on the NYT at all. 👉@jason_kint & @CeciliaZin largely converge on the overall seriousness of the suit. 👉OpenAI...
@ceciliazin Cecilia Ziniti on x
7/ 💼 Another interesting point: NYT got really good lawyers. Susman Godfrey has a great reputation and track record taking on tech. This isn't a quick cash grab like the lawsuits filed a week after ChatGPT; it's a strategic legal challenge.
@jeffjarvis @jeffjarvis on x
Here is the lede of the NYTimes' suit against OpenAI, about the sacredness of journalism, followed by excerpts from The Gutenberg Parenthesis about the sacred rhetoric used by newspapers to oppose radio's entrance into news: [image]
@mathewi Mathew Ingram on x
I get that newspapers are all mad at AI, but summarizing Times articles and even “mimicking its expressive style” shouldn't constitute copyright infringement under any reasonable definition https://www.theverge.com/...
@binarybits Timothy B. Lee on x
It doesn't seem out of the question that AI companies could lose these cases catastrophically and be forced to pay billions to plaintiffs and rebuild their models from scratch.
@danprimack Dan Primack on x
2/ Counterargument is that gen AI cos will just need to pay pennies, so it's not really too big a deal given their billions in outside investment. BUT: If $$ is meager, media cos may not play ball. Particularly given that gen AI threatens to cannibalize their traffic.
@amasad Amjad Masad on x
I feel betrayed that ChatGPT, who I consider a close friend, turned out to be a basic bitch mainstream reporter 😭 [image]
@jason_kint Jason Kint on x
The complaint also steps through the preference and weighting used for sources with claims NYT-sourced content is more valued for training. And that undermining that real investment will undermine the entire market for journalism - including licensing it for future AI. /10 [image…
@bgrueskin Bill Grueskin on x
NYT's suit showed that “a Microsoft search feature powered by ChatGPT reproduced almost verbatim results from Wirecutter. “The results did not link to the Wirecutter article, and they stripped away referral links used to generate commissions from sales” https://www.nytimes.com/..…
@antoniogm Antonio García Martínez on x
The next big field of media attribution to develop will be crediting underlying content for powering AI queries, and measuring how much revenue should be shared with this or that publisher (in effect, a training data publisher, not an content one).
@afinetheorem Kevin A. Bryan on x
NYT/OpenAI lawsuit completely misunderstands how LLMs work, and judges getting this wrong will do huge damage to AI. Basic point: LLMs DON'T “STORE” UNDERLYING TRAINING TEXT. It is impossible- the parameter size of GPT-3.5 or 4 is not enough to losslessly encode the training set.
@jason_kint Jason Kint on x
ok, I've now read the full NYT complaint filed this morning vs OpenAI and Microsoft. I'm impressed - it's future-focused around fair value for work vital to democracy. It also contains 220k pages of exhibits although the pages of Ex J stood out to me. more on that in a minute. /1…
@garymarcus Gary Marcus on x
@sbergman @jason_kint These systems are always stochastic; few prompts ever give same results over and over
@ivanthek @ivanthek on x
I cancelled my New York Times subscription since I already get ChatGPT.
@benwansell Ben Ansell on x
Some of the direct comparisons with NYT articles and Chat GPT output are insane. And to think we have been obsessing about plagiarism in the acknowledgements of a 90s PhD while this is the current state of affairs with LLMs.
@jason_kint Jason Kint on x
Here are four examples. Again, the lawsuit includes one hundred of them. You get the point. I find this exhibit to be an incredibly powerful illustration for a lawsuit that will go before a jury of Americans. Again, it's impossible to argue with this. /14 [image]
@ceciliazin Cecilia Ziniti on x
6/ 🚫 Misinformation allegations add a clever twist. The complaint pulls in something people are scared of - hallucinations - and makes a case out of it, citing examples where elements of NYT articles were made up. 🍊 Most memorable example? Alleging Bing says the NYT published...
@antoniogm Antonio García Martínez on x
The next big field of media attribution to develop will be crediting underlying content for powering AI queries, and measuring how much revenue should be shared with this or that publisher (in effect, a training data publisher, not a content one).
@conorsen Conor Sen on x
So how exactly are LLM's going to work when all the content providers block access to their content without permission? Or will this be a lucrative new source of revenue for the NYT/WSJ et al? [image]
@levie Aaron Levie on x
This is a good thing. We will be able to finally put to rest the ambiguity on copyrights and training data. [image]
@bobbyallyn Bobby Allyn on x
From my story in August about the NYT gearing up to sue OpenAI: “if a federal judge finds that OpenAI illegally copied the Times' articles to train its AI model, the court could order the company to destroy ChatGPT's dataset” https://www.npr.org/...
@emollick Ethan Mollick on x
In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles. […
@mileskruppa Miles Kruppa on x
Here's Google's AI summary of the Times vs Microsoft/OpenAI lawsuit. Links to CNN, The Verge, and others, but a snub to the Times' own article on the subject. [image]
@caffar3cristina Cristina Caffarra on x
News from respectable outlets is super valuable training data bcs it is verified and accurate (relatively). In the ocean of repetitive junk data that models are trained over, decades of news archives are very useful. This is about division of rents, expect a lot more of it.
@chamath Chamath Palihapitiya on x
The interesting thing about this NYT/OpenAI lawsuit is the counterfactual. If Apple is, indeed, writing substantial checks to media companies to license their content for training models, the impact of this and other lawsuits against AI companies training on non-public data will…
@ryan_browne_ Ryan Browne on x
The NYT is suing Microsoft and OpenAI for “billions of dollars” worth of damages over alleged copyright infringement. The Times alleges the firms created a business based on “mass copyright infringement” through their use of news content to train AI. https://www.cnbc.com/...
@jdcapelouto J.D. Capelouto on x
Interesting — the NYT lawsuit leans on the fact that Times stories were heavily posted to Reddit, which was then apparently used to train OpenAI's models: [image]
@quinnnorton @quinnnorton on x
This will cost of lot of money and be stupid.
@davewiner Dave Winer on x
For what it's worth, my two cents on the NYT suit re ChatGPT. https://scripting.com/... [image]
@danprimack Dan Primack on x
1/ The gen AI model is f'd if companies like OpenAI need to pay copyright holders for their content. That would cover every news media, social media, music, book, etc publisher. Maybe a big advantage for Meta/X, who own their own content.
@sarafischer Sara Fischer on x
.@nytimes sues @OpenAI and @Microsoft for copyright infringement — This is a big deal because it could set a precedent for 1. How courts define the value of news content in training large language models and 2. What the damages are for previous use @axios https://www.axios.com/..…
@robleathern Rob Leathern on x
In other news, NYT plans to sue the Internet Archive and Wikipedia next for allowing people to ‘rely on past journalism by The Times’... 🤔 “When chatbots are asked about current events or other newsworthy topics, they can generate answers that rely on past journalism by The...
@grynbaum Michael M. Grynbaum on x
Read The Times's full complaint against OpenAI and Microsoft here: https://www.nytimes.com/... https://nytco-assets.nytimes.com/ ...
@nexusben Ben Schoon on x
Curious to see where this lands. Strongly feel that these AI products will be unsustainable one way or another. Paying for licensing will be too expensive, but these will also lessen traffic, lessening content available by way of ad rev, which means less for AI to feed off
@claireatki Claire Atkinson on x
I'm all for progress but big tech is simply stealing and repackaging content, often inaccurately. Journalism is expensive. ChatGPT should pay for it. https://www.wsj.com/...

Chronicles

The NYT sues OpenAI and Microsoft for copyright infringement, alleging they used millions of its articles to train AI, the first major US media outlet to sue

Related Coverage

Discussion