In federal court, lawyers for Microsoft and OpenAI defended the scraping of news stories to train LLMs and urged the dismissal of news outlets' copyright claims
I really have no idea how those arguments landed. Kate Knibbs / @knibbs : Daily News & NYT lawyer now speaking. Pushing back on the idea that the publishers should've known about the ingestion prior to statute of limitations cut-off Kate Knibbs / @knibbs : Different NYT lawyer now. Is saying that LLMs don't actually learn “facts” as Microsoft asserted. “It only absorbs the expression of the fact, it never learns the underlying fact. These are just statistical models” Kate Knibbs / @knibbs : Judge: “My confusion is that I read that the models are always learning” — (also, no shade to the court. this shit is hard to understand as a lay person!) Kate Knibbs / @knibbs : Listening into today's hearing in NYT v. Microsoft/OpenAI — the judge is currently going over how the training process works with the NYT lawyer. “The information is stored in packets?” X: Marty Swant / @martyswant : Judge: What's the injury? Lieberman: “You're leaving people open for massive copyright infringement without the ability to trace it...It's like it causes the alarm system in your house to go down.” Jason Kint / @jason_kint : Attended oral arguments in NYT+ v OpenAI/Microsoft this morning @ SDNY. It was good to finally hear some back and forth - majority of claims imho will clearly survive OpenAI's attempts to dismiss - but all of OpenAI's PR offensive and talking points were on full display. /1 Marty Swant / @martyswant : Judge just asked OpenAI's lawyer if GPT models have NYT content in its database. Lawyer said not doesn't, but then clarified it doesn't have a database but instead relies on weights. I wonder how NYT's lawyers will respond to that. Marty Swant / @martyswant : NYT's lawyer Ian Crosby said LLM outputs are just “the expression of the words and facts, not the underlying facts in the words and the models.” “They're not next-generation search engines, but answer engines,” Crosby said, adding that they're not substitutional. Marty Swant / @martyswant : Just joined the NYT v OpenAI oral arguments. Every time someone joins the audio feed, it announces the person w/ each person's voice & name...Awkward and distracting! Jason Kint / @jason_kint : A convo on “memorization” also interesting.. discovery here may be enlightening. I say this based on Facebook discovery unsealed last night showing how it balanced “memorization rate” with risk acknowledging the issue it creates. Don't miss this. /9 https://x.com/... Marty Swant / @martyswant : Lieberman: When copyright management information (CMI) is retracted, outputs will be “either the verbatim language or a summary without the attribution of the new york times or the daily news.” John Legere / @johnlegere : Publications like the New York Times are going after OpenAI about their use of articles to teach their AI. Will this mean more companies will go after not only OpenAI but other companies using AI too? https://www.npr.org/... Forums: r/technology : ‘The New York Times’ takes OpenAI to court. ChatGPT's future could be on the line See also Mediagazer