OpenAI says it has seen some evidence that DeepSeek used “distillation” to train its open-source competitor by using outputs from OpenAI's proprietary models

White House AI tsar David Sacks raises possibility of alleged intellectual property theft

Financial Times 2025-01-29

Discussion

@tomwarren.co.uk Tom Warren on bluesky
OpenAI scraped the internet and copyrighted material, and now it's suddenly concerned about plagiarism ¯\_(ツ)_/¯ [embedded post]
@mmasnick Mike Masnick on bluesky
So, look. I'm sure I'm in the minority here on Bluesky in believing that training AI systems isn't copyright infringement. — But, also. Dude. — There's no way OpenAI can make this argument without looking very, very silly. [embedded post]
@mims Christopher Mims on bluesky
I wonder if this is covered by OpenAI's arguments in the various suits by content creators against the company and its ilk — I mean I could see lawyers trying to split hairs on the matter but it seems entirely defensible given how every LLM came to be [embedded post]
@stevefaktor.com Steve Faktor on bluesky
Turns out DeepSeek is selling irony. [embedded post]
@pelops Joseph on bluesky
I love the idea that we should care about DeepSeek stealing OpenAI's labor. It's like you can hear a billion people collectively saying “How do you like it??”. [embedded post]
@ancatdubh@mastodon.ie Tommy Kavanagh on mastodon
A bit of a non story but curious all the same that those who've whined about being taken to task for using others [copyrighted] materials should now be whining that others are allegedly using their work. — “The maker of ChatGPT, OpenAI, has complained that rivals, including tho…
@markchen90 Mark Chen on x
Congrats to DeepSeek on producing an o1-level reasoning model! Their research paper demonstrates that they've independently found some of the core ideas that we did on our way to o1.
@jukanlosreve @jukanlosreve on x
Just in: OpenAI secures evidence that DeepSeek used its proprietary model for training • The Financial Times reports that DeepSeek used OpenAI's proprietary model to train its own open-source competitor. • OpenAI stated that it has obtained evidence that DeepSeek employed
@stevesi Steven Sinofsky on x
Intellectual property rights? Isn't this just a “transformative use case” of OpenAI IP? 🤔 Everyone is for open until it is their work. [image]
@ananayarora @ananayarora on x
Deepseek has *removed* the DNS record to their OpenAI private proxy as of today after this gained attention. Earlier, this was pointing to an IP on Huawei Cloud but now the subdomain doesn't resolve at all. [image]
@can @can on x
People, especially publishers, will rightfully make fun of OpenAI accusing DeepSeek of what it did to them. I think though there's something fundamental we should internalize: zero marginal cost of data transmission comes for all of us.
@edzitron Ed Zitron on x
I'm so sorry I can't stop laughing. OpenAI, the company built on stealing literally the entire internet, is crying because DeepSeek may have trained on the outputs from ChatGPT. They're crying their eyes out. What a bunch of hypocritical little babies. https://www.ft.com/... [ima…
@gfodor @gfodor on x
Ok so apparently the plan now is to use the fancy term “distill” to lock people into this dumb narrative that including some raw OpenAI API slop in your training set (if they did that) is some kind of profound IP theft. So embarrassing for the US. Make it stop.
@nicoleperlroth Nicole Perlroth on x
It would be shocking if DeepSeek *didnt* scrape from OpenAI and Llama. But OpenAI scraped the earth, stealing IP for its models, and can't exactly pretend to be an ethical actor in return. Trying to roll this back is like trying to remove the pee from the pool.
@wzihanw Zihan Wang on x
Wow. So why didn't you open-source it?
@mayazi Maya Zehavi on x
So OpenAI gets a green pass to violate copyright but other models using distillation is a no-go if they surpass the OpenAI model? Talk about having it both ways to create a U.S. big tech moat https://www.ft.com/...
@dan_jeffries1 Daniel Jeffries on x
As we are learning DeepSeek is one of the most sophisticated psyops of all time. Here's how it went down: 1) Release the model open source. 2) Include highly detailed papers for all other people to replicate your work. 3) Create a novel SOTA RL algo that uses less memory
@quixoteknight @quixoteknight on x
Palmer I love you and your work the paper replicates
@rasmus_kleis Rasmus Kleis Nielsen on x
OpenAI, currently battling allegations of its own copyright infringement, “says it has evidence China's DeepSeek used its model to train competitor” (As w/other such discussions, beyond the law, look out for industry-politics coalitions-here OpenAI-Sacks) https://www.ft.com/...
@cristinacriddle Cristina Criddle on x
SCOOP: OpenAI tells the FT it has evidence that DeepSeek used its propriety models to train its open-source competitor. Its terms of service state output from its AI models cannot be used to “develop models that compete with OpenAI” (w/@EleanorOlcott) https://www.ft.com/...
@gallabytes @gallabytes on x
did you know: the best way to spread chinese propaganda & undermine the american economy is to upload preprints to arxiv, release the results open source under a permissive license, then wait for the forbes readers to throw a tantrum.
@emollick Ethan Mollick on x
The most unnerving part of the DeepSeek reaction online has been seeing folks take it as a sign that AI capability growth is not real It signals the opposite, large improvements are possible, and is almost certain to kick off an acceleration in AI development through competition
@zeyuanallenzhu @zeyuanallenzhu on x
Totally disagree. DeepSeek has >= 4 IOI gold medalists from team China (each = multiple IOI golds in other countries) and many national golds. Rumors say they have 100 people, so 4% IOI gold rate (or 15+% if accounting for competitiveness in China) beats most companies in the US.
@kortizart Karla Ortiz on x
Wait so DeepSeek took from Open Ai without credit, consent or compensation???? W i l d! Now where have I heard this problem before 🤔 🤔🤔😒 https://www.ft.com/...
@max_paperclips Shannon Sands on x
It was just the number for the run, they say that IN THE PAPER IF ANYONE WOULD BOTHER TO READ IT
@palmerluckey Palmer Luckey on x
DeepSeek is legitimately impressive, but the level of hysteria is an indictment of so many. The $5M number is bogus. It is pushed by a Chinese hedge fund to slow investment in American AI startups, service their own shorts against American titans like Nvidia, and hide sanction
@eleanorolcott Eleanor Olcott on x
OpenAI has accused DeepSeek of exploiting the fruits of its expensive labour to improve its models. I can't help but be reminded of when the company was accused of doing something rather similar... [image]
@angelazhanghk Angela Zhang on x
It's ironic that OpenAI, which has faced numerous lawsuits for failing to properly compensate content creators, is now accusing DeepSeek of IP violations. https://www.ft.com/... via @ft
@marietjeschaake Marietje Schaake on x
Breaking: OpenAI finds new love for intellectual property rights ↘️ https://www.ft.com/...
@daveg David Galbraith on x
Copying from the thing that copied the copyright. Either building a model by reverse engineering it is legal, or building the model in the first place, from copyright data isn't. Can't have it both ways. https://www.ft.com/...
@jxmnop Jack Morris on x
another incredible thing about deepseek: all the american AI labs compete to hire the top PhD researchers - but deepseek didn't compete deepseek researchers aren't top PhDs. most are not even PhDs
@nearcyan Near on x
Why did DeepSeek go so viral? tl;dr: class resentment, anger, and especially schadenfreude. very little actual app usage in comparison to the above. [image]
@triskweline Henning Koch on x
I mean it's practically public domain.
@denisewu Denise Wu on x
Mystery solved, why DeepSeek keep calling itself OpenAI. [image]
@yacinemtb Kache on x
it's not that people want deepseek to win it's that they want openAI to lose
@aidan_regan Aidan Regan on x
The irony. OpenAI built their entire business model around stealing information from the world wide web (a massive land grab) and monetising it. They are now complaining that others use their model to replicate something similar. https://www.ft.com/...
@deliprao @deliprao on x
Closed source work makes it impossible to trust these claims. As far as science is concerned, the attribution of techniques in R1 should be solely to deepseek authors unless OpenAI open up their code base with verifiable commit history.
@markchen90 Mark Chen on x
However, I think the external response has been somewhat overblown, especially in narratives around cost. One implication of having two paradigms (pre-training and reasoning) is that we can optimize for a capability over two axes instead of one, which leads to lower costs.
@wassielawyer @wassielawyer on x
Every Chinese company has three sets of accounts. The books you show the Americans, the books you show the CCP and the books you show the investors. The real book probably doesn't exist.
@paulskallas LindyMan on x
DeepSeek is a Chatgpt killer, not a Claude killer. Claude has a personality. It is crafted to be more intimate. Different experience DeepSeek does the same info dump on you as Chatgpt but better. For Chatgpt to survive, they need to focus on personality. Make it fun. Fun is
r/news r on reddit
OpenAI says Chinese rivals using its work for their AI apps
r/nottheonion r on reddit
OpenAI says Chinese rivals using its work for their AI apps
r/China r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/neoliberal r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/technology r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/LeopardsAteMyFace r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/aiwars r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/artificial r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/ChatGPT r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
r/OpenAI r on reddit
OpenAI says it has evidence China's DeepSeek used its model to train competitor
@davemcc David McConnell on bluesky
Oh the irony. Mr “We need to redefine copyright” is suddenly unhappy about IP being stolen. [embedded post]
@quinnypig.com Corey Quinn on bluesky
Real defenders of intellectual property rights, those two. [embedded post]
@kevinroose Kevin Roose on x
must suck to have someone train AI models on your data without permission, wonder what that's like
@newley Newley Purnell on x
Microsoft's security researchers in the fall observed individuals they believe may be linked to DeepSeek exfiltrating a large amount of data using the OpenAI application programming interface, or API, said the people, who asked not to be identified because matter is confidential.
@sawyermerritt Sawyer Merritt on x
NEWS: Microsoft and OpenAl are investigating whether data output from OpenAl's technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek. https://www.bloomberg.com/...
@mrbcyber Michael Ron Bowling on x
China has been stealing US tech for decades, from routers to nuclear weapons. So no one should be surprised about DeepSeek. https://www.bloomberg.com/... via @technology
@benitoz Ben Pouladian on x
PLOT THICKENS Microsoft and OpenAI in the fall observed people they believed linked to DeepSeek scraping a LARGE amount of data. How do you like them apples 😂 $nvda $msft DeepFraud [image]
@shiringhaffary Shirin Ghaffary on x
NEW from @dinabass and me: Microsoft and OpenAI are investigating whether a group linked to DeepSeek obtained data output from OpenAI's tech in an unauthorized manner, per sources. https://www.bloomberg.com/...
r/LocalLLaMA r on reddit
Microsoft Probing If DeepSeek-Linked Group Improperly Obtained OpenAI Data
r/technology r on reddit
Microsoft and OpenAI Probing If DeepSeek-Linked Group Improperly Obtained OpenAI Data
r/wallstreetbets r on reddit
Microsoft and OpenAI Probing If DeepSeek-Linked Group Improperly Obtained OpenAI Data
@dbsmasher.com @dbsmasher.com on bluesky
Open AI trained its models on copyrighted data and they didn't feel any qualms. — The world's tiniest violin really. [embedded post]
@quinnypig.com Corey Quinn on bluesky
Company that ripped off the entire internet incensed that the shoe is suddenly on the other foot: [embedded post]
@brettfavaro Brett Favaro on bluesky
*makes giant plagiarism engine* — Hey don't copy me you copycat! [embedded post]
@daledoback3 Dale Doback on bluesky
So, wait, they got Pied Pipered in real life?
@brianstorms Brian Dear on bluesky
I so very much . . . do not give a flying F what's causing David Sacks a sad.
@micah541 Micah Warren on bluesky
Isn't the entire fucking point of AI to distill what others have created? Sounds like they did it better than any of your companies, David. [embedded post]
@karlbode.com Karl Bode on bluesky
they're gonna spend a few weeks throwing excuses at the wall to explain why they were out-innovated by china (again) and ultimately land on xenophobic fear mongering (again)
@jetjocko Adam Rogers on bluesky
What I've read so far, the “evidence” is that they can't figure out how else DeepSeek could've done it, which, hrm. — Also, *if* someone stole OpenAI's data, call the police, using copyrighted material without paying its creators is a crime, good point. — techcrunch.com/2025/…
@rtm223.me Richard Meredith on bluesky
Oh, wow. Apparently using other peoples' IP to train an AI model is theft now, is it? Weird how that changed so very quickly 🤔 — techcrunch.com/2025/01/28/d...
@billbennett@mastodon.nz Bill Bennett on mastodon
The irony of a US-owned intellectual property stealing technology accusing Chinese-owned intellection property stealing technology of stealing American IP. You could not make this stuff up. — https://www.reuters.com/...
@tsarnick @tsarnick on x
Asked if China's DeepSeek stole American IP, AI Czar David Sacks says it looks like a technique called distillation was used where a student model can “suck the knowledge” out of the parent model and there is evidence that DeepSeek distilled knowledge from OpenAI's models, which …
@menhguin Minh Nhat Nguyen on x
after seeing finance people attempt to explain evil Chinese concepts like open source, distillation and GPU training, i no longer think the average person can adapt to AI
@yacinemtb Kache on x
noooo deepseek is a psyop nooo muh tik tok muh ccp mu- *company trains an AI to refuse your requests when you want smut* *company makes it lock down and say “policy, you are a bad person"* *ceo goes to biden and then lobbies to make it illegal to train ai so he can maintain a
@yacinemtb Kache on x
student model suck the knowledge out of the parent model until i distill
@autismcapital @autismcapital on x
🚨NEW: David Sacks, AI and Crypto Czar (@DavidSacks) speaks about DeepSeek and the implications for National Security on FOX News [video]
@alexandr_wang Alexandr Wang on x
What does DeepSeek R1 & v3 mean for LLM data? Contrary to some lazy takes I've seen, DeepSeek R1 was trained on a shit ton of human-generated data - in fact, the DeepSeek models are setting records for the disclosed amount of post-training data for open-source models...
@gfodor @gfodor on x
make it stop, please, i can't take anymore
@aravsrinivas Aravind Srinivas on x
There's a lot of misconception that China “just cloned” the outputs of openai. This is far from true and reflects incomplete understanding of how these models are trained in the first place. DeepSeek R1 has figured out RL finetuning...
r/technology r on reddit
White House “looking into” national security implications of DeepSeek's AI

Chronicles

OpenAI says it has seen some evidence that DeepSeek used “distillation” to train its open-source competitor by using outputs from OpenAI's proprietary models

Related Coverage

Discussion