How DeepSeek outpaced OpenAI at a fraction of the cost: open source, pure reinforcement learning, no supervised fine-tuning, and building on DeepSeek-R1-Zero

DeepSeek R1's Monday release has sent shockwaves through the AI community, disrupting assumptions about what's required to achieve cutting-edge AI performance.

VentureBeat 2025-01-27 Matt Marshall

Discussion

@deedydas Deedy on x
DeepSeek R1 isn't just “25x cheaper than GPT o1”... It is better than the unreleased OpenAI o3 at the same cost at coding on Codeforces and ARC-AGI! [image]
@natfriedman Nat Friedman on x
The deepseek team is obviously really good. China is full of talented engineers. Every other take is cope. Sorry.
@morganb Morgan Brown on x
7/ The results are mind-blowing: - Training cost: $100M → $5M - GPUs needed: 100,000 → 2,000 - API costs: 95% cheaper - Can run on gaming GPUs instead of data center hardware
@0xkarmatic Karma on x
The visible chains of thought in DeepSeek r1 makes it so easy to prompt it as you can clearly tell when your instructions were ambiguous. Missed opportunity from OpenAI to make their COTs visible. Now that the genie is out of the lamp and we have a reproduction of an o1-like
@morganb Morgan Brown on x
🧵 Finally had a chance to dig into DeepSeek's r1... Let me break down why DeepSeek's AI innovations are blowing people's minds (and possibly threatening Nvidia's $2T market cap) in simple terms...
@morganb Morgan Brown on x
2/ DeepSeek just showed up and said “LOL what if we did this for $5M instead?” And they didn't just talk - they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.
@itsolelehmann Ole Lehmann on x
DeepSeek is a 100x more based name than ChatGpt or Claude
@emollick Ethan Mollick on x
I think the market will adjust to any per token cost decrease brought on by DeepSeek quite quickly. Costs for GPT-4 level intelligence dropped by 1000x in the last 18 months. A 95% price drop in reasoning models seems not to be something that will break the labs.
@beeple @beeple on x
DEEPSEEK v. OPENAI [image]
@ananayarora @ananayarora on x
DeepSeek has had a private proxy to OpenAI atleast until 2024-08-10. The existence of this hints that they probably didn't pay the regular API pricing to OpenAI and used a fleet of bots to query chatGPT instead, during training [image]
@morganb Morgan Brown on x
8/ “But wait,” you might say, “there must be a catch!” That's the wild part - it's all open source. Anyone can check their work. The code is public. The technical papers explain everything. It's not magic, just incredibly clever engineering.
@teknium1 @teknium1 on x
Its crazy deepseek direct api has seemingly no rate limits of any kind
@emostaque Emad on x
Simpler way to understand DeepSeek weren't lying about 50k H100s or training costs for V3/R1 We have the model, its a 35b active, 640b Mixture of Experts We know that spec is 2-3m hours to train Models get worse with more compute after a certain point! https://www.harmdevries.com…
@arithmoquine Henry on x
i've made over 200,000 requests to the deepseek api in the last few hours. zero ratelimiting, and the whole thing cost me like 50 cents. bless the CCP, openai could never
@snowmaker Jared Friedman on x
Lots of hot takes on whether it's possible that DeepSeek made training 45x more efficient, but @doodlestein wrote a very clear explanation of how they did it. Once someone breaks it down, it's not hard to understand. Rough summary: * Use 8 bit instead of 32 bit floating point
@rakyll Jaana Dogan on x
DeepSeek codebases are clean and well authored. I learned a lot by reading their work just over the weekend. You cannot deny that they are raising the bar, and wish we focus on quality instead of short sighted incremental work.
@morganb Morgan Brown on x
6/ Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task.
@theshortbear @theshortbear on x
DeepSeek seems to have created a panic moment within the biggest companies and it should alarm investors. Costs: 2,048 Nvidia H800 GPUs: $40-50 million · Training: $5 million If all it takes to beat OpenAI is a maximum of $55 million, the industry is becoming commoditized way f…
@firstadopter Tae Kim on x
It's silly town on here right now as engagement farmers compare apples and oranges but I'll just cite Bernstein (Bernstein is right): “Did DeepSeek really build OpenAI for $5 million? Of course not” “a fundamental misunderstanding over the “$5M” number” “categorically false”
@pitdesi Sheel Mohnot on x
So many viral tweets comparing OpenAI's $6.6B raised to <$10M from DeepSeek 🤦🏽‍♂️ People are so dumb.
@morganb Morgan Brown on x
3/ How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like “what if we just used 8? It's still accurate enough!” Boom - 75% less memory needed.
@wordgrammer @wordgrammer on x
Okay. Thanks for the nerd snipe guys. I spent the day learning exactly how DeepSeek trained at 1/30 the price, instead of working on my pitch deck. The tl;dr to everything, according to their papers:
@morganb Morgan Brown on x
1/ First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory.

Chronicles

How DeepSeek outpaced OpenAI at a fraction of the cost: open source, pure reinforcement learning, no supervised fine-tuning, and building on DeepSeek-R1-Zero

Related Coverage

Discussion