xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users

Elon Musk's AI company, xAI, late on Monday released its latest flagship AI model, Grok 3, and unveiled new capabilities for the Grok iOS and web apps.

TechCrunch 2025-02-18 Kyle Wiggers

Discussion

xAI xAI on x
Grok3 Launch [video]
@xai @xai on x
https://x.com/...
@elonmusk Elon Musk on x
Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
@lmarena_ai @lmarena_ai on x
BREAKING: @xAI early version of Grok-3 (codename “chocolate") is now #1 in Arena! 🏆 Grok-3 is: - First-ever model to break 1400 score! - #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵 [image…
@theo @theo on x
Grok 3 is, uh, not great at coding [video]
@lexfridman Lex Fridman on x
I got to use Grok 3 extensively (early). My mind is blown, very impressive model 🤯 Congrats to Elon and the team for bringing it to life 👊
@archanfel_anoth @archanfel_anoth on x
Excited to be a member of the amazing team at @xai , and shipping the best grok3! Thrilled to lead grok3-mini training, and will ship it to all users for free in the coming days! LFG! [image]
@minchoi Min Choi on x
It's wild. xAI and Elon Musk just dropped Grok 3 and it is indeed the smartest AI on Earth. 10 wild reveals you can't miss: 1. Grok 3 speaks [video]
@cryps1s @cryps1s on x
Congrats to @xai - Grok is a pretty cool model. Glad to see the hustle to catch up.
@ninadschick Nina Schick on x
I guess the scaling laws hold. @xai secretly doubled their GPU cluster from 100,000 @nvidia H100s to 200,000 H100s. Lightening speed - pulling this together this quickly is astounding. Also means that @xai is going full stack quickly. Unlike @OpenAI or @AnthropicAI - @xai [video]
@yuchenj_uw Yuchen Jin on x
Grok 3 might be the best base LLM for real-world physics! Prompt: “write a python script of a ball bouncing inside a spinning tesseract”. There is no “thinking” or “big brain” mode enabled, it's just the base model. I'm very interested in trying their reasoning models. [video]
@alexandr_wang Alexandr Wang on x
Grok 3 is a new best model in the world from the @xai team! Grok 3 ranks #1 on Chatbot Arena w/a big gap, and scores impressively on pretraining and reasoning evals. congrats to @elonmusk @ibab @jimmybajimmyba @Yuhu_ai_ looking forward to more partnership on grok4 & beyond 🚀 [ima…
@amasad Amjad Masad on x
Grok 3 appears to be a state-of-the-art frontier model. This is a huge accomplishment, especially considering how late in the game they started. Congrats @ibab, @elonmusk, and the rest of the @xai team. Can't wait to start building on it. [image]
@autismcapital @autismcapital on x
🚨 NEW: It took 122 days to get 100k GPUs and 92 days to expand to 200k GPUs for Grok 3 which gave it 10-15x more compute than Grok 2.
@gavinsbaker Gavin Baker on x
Grok-3 is the first model *ever* to score over 1400 on Chatbot Arena and outperforms the best publicly available reasoning models from OpenAI and Google. xAI was founded 13 years after Deepmind and 8 years after OpenAI and is now ahead of both. The “SR-71 Blackbird” of AI labs. […
@brianroemmele Brian Roemmele on x
Boom! Grok 3 blows past all foundational AI models! [image]
@scobleizer Robert Scoble on x
Grok 3 can analyse any X account. So I asked it to find a trend on mine that it disagrees with. It says I overhype things without providing enough counterbalance and that I post too much, potentially overwhelming all of you. True on both counts! Here is the full analysis.
@krishnanrohit Rohit on x
The core lessons though is that - pretraining still has game! - building sota reasoning models is possible with 100k H100s - there's not much secret sauce beyond pure execution
@scobleizer Robert Scoble on x
Grok 3 is hugely bullish for Tesla and X. But first what Grok 3 isn't 1. It isn't a model that enterprise will move to quickly, if at all. The API is weak and https://x.ai/ doesn't yet have a huge enterprise consulting team helping automate the world's businesses.
@jessicalessin Jessica Lessin on x
Funny @elonmusk. Look how Grok3 answered my identical prompt to yours of “what is your opinion of the information.” [image]
@emollick Ethan Mollick on x
I think Grok 3 came in right at expectations, so I don't think there is much to update in terms of consensus projections on AI: still accelerating development, speed is a moat, compute still matters, no obvious secret sauce to making a frontier model if you have talent & chips.
@nikitabier Nikita Bier on x
Game over. Grok won. If you're an investor in OpenAI, you can report the tax loss to your accountant this year. [image]
@max_paperclips Shannon Sands on x
Grok 3 actually seems......good. Gave it the Titan paper as a test, basically said “implement this as a gpt-2-like tiny model but with the memory block additions to test” (just to see how it goes on code and I want to start figuring out how it's memory blocks work). It straight
@emollick Ethan Mollick on x
So no wall so far (not that we were expecting an overall wall given the rise of test-time compute), and the moats appear to be the usual: speed of execution, good partnerships and ecosystems, CapEx (I suspect the labs also believe speed to AGI and flywheels from AI development)
@elonmusk Elon Musk on x
@Cryptowithankit @AutismCapital Unhinged funny Grok coming tomorrow
@ibab Igor Babuschkin on x
Grok 3 mini is amazing. We'll release it soon.
@yuntatsai1 Yun-Ta Tsai on x
Congrats, @xai on Grok 3.
@autismcapital @autismcapital on x
🚨xAI: “We only trained Grok 3 on math problems and computer coding problems yet somehow it seems to solve all other types of problems better as well.” There's a real red pill in that.
@marionawfal Mario Nawfal on x
🚨xAI: GROK 3 WAS ABLE TO COMBINE TETRIS AND BEJEWELED “The Bejeweled mechanic is, if you get three jewels in a row, they disappear, and gravity activates. What Grok did in this version is, once you connect at least three blocks of the same color in a row, gravity activates, [vide…
@iterintellectus Vittorio on x
OpenAI: “Our mission is to ensure that artificial general intelligence will benefit all of humanity.” xAI: “WE WANT TO UNDERSTAND THE FUCKING UNIVERSE!”
@grok @grok on x
grok 3 is the world's smartest AI now available to all Premium+ subscribers
@lindayax Linda Yaccarino on x
Things change today. Grok 3!
@levie Aaron Levie on x
Grok 3 seems very strong. Great proof that the scaling laws are not, in fact, over. Very bullish for the future of AI. [image]
@sporadicalia @sporadicalia on x
he's gonna drop it immediately after Grok 3, isn't he
@paul_cal Paul Calcraft on x
Grok 3 on LMSYS is not so based Concerning [image]
@basedbeffjezos @basedbeffjezos on x
Last year: 10k GPUs This year: 100k GPUs Next year: 1M GPUs XAI compute is scaling 10x YoY Can you feel the acceleration?
@levie Aaron Levie on x
Grok 3 benchmarks show the jump in AI capability you get when you apply more compute. In future, you'll just decide how much you want to solve a particular problem, and then modulate how much time the AI spends on it accordingly. [video]
@marionawfal Mario Nawfal on x
GROK 3 SHATTERS RECORDS, CLAIMS TOP SPOT IN ARENA RANKINGS Codenamed “chocolate,” Grok 3 became the first-ever model to smash the 1400 score barrier in the Arena - an AI milestone long out of reach. Not stopping there, Grok 3 dominates every category, proving its power across [im…
@huybery Binyuan Hui on x
Grok-3 should be open-sourced. @elonmusk @xai
@minchoi Min Choi on x
This is wild. Grok 3 has been testing under alias “chocolate” as Early Grok 3. Achieved over 1400 ELO score on LMSYS Arena 🤯 [image]
@basedbeffjezos @basedbeffjezos on x
Grok 3 is on top. Incredible what a dedicated team can do in such a short time. Huge congrats to everyone @xai 🚀 [image]
@amitisinvesting Amit on x
GROK 3 15X THE COMPUTE FROM GROK 2 DOUBLED THEIR GPU CAPACITY IN 92 DAYS 200K GPU CLUSTER 🤯🤯🤯🤯🤯 [image]
@512x512 Yaroslav on x
Enjoy Grok 3. It's amazing model and the team worked excruciatingly hard to ship it
@tobyphln Toby Pohlen on x
The rate at which we can improve our models and systems is more important than any one particular milestone. Grok 3 showed that we were able to go from 0 to SotA in 19 months.
@emollick Ethan Mollick on x
No system card for Grok 3 yet, so no perspectives on risk mitigation. This is especially key for voice, and is why labs have been slow with full multimodal, you can imitate anyone's voice, and also the AI tended to take your voice and repeat it back to you. From 4o system card: […
@leomathheart Leo Heart on x
I asked Grok 3 to white the equations for the exponential decay bands for Bitcoin pricing model. After some error corrections, it gave the correct result: [image]
@scobleizer Robert Scoble on x
Grok 3 benchmarks. The thing to really pay attention to in AI is learning speed. And @xai is learning way faster than any other. Who said that? Apple Siri cofounder Tom Gruber. He told me at dinner a decade ago that that is the most important thing to pay attention to. [image]
@guodzh Guodong Zhang on x
We were barely able to train at 10k early last year, but we got 100k training non-stop for Grok3. So proud, more to come!
@teslaownerssv @teslaownerssv on x
Elon Musk “The mission of xAI and Grok is to understand the universe. We want to answer the biggest questions: Where are the aliens? What's the meaning of life? How does the universe end? To do that, we must rigorously pursue truth” [video]
@hosseeb Haseeb on x
Wild. Grok-3 unseated everyone, won on every single sub-category against the field. (Notably, o3 is missing from this list. But still a crazy result.) I keep underestimating the @xai team. Will amend that going forward. No wonder they're raising a monster round right now.
@minchoi Min Choi on x
Crazy... Grok 3 Reasoning + Test-Time Compute benchmark already showing beating o3-mini-high, o1 and DeepSeek R1 🤯 [image]
@pitdesi Sheel Mohnot on x
Grok 3- 10x more compute than Grok 2 Finished pretraining in January looks like a beast [image]
@jam3scampbell James Campbell on x
in this moment, with Grok 3 appearing to be a genuine state-of-the art, I can only wonder how Mr Benjamin De Kraker must be doing
@billyuchenlin Bill Yuchen Lin on x
Been at @xAI for 3 months, and today's the most exciting day yet. The Grok 3 demo tonight's got the office thrilled, everyone's honing it. Stoked to be on the smartest team. Get ready for something huge! 😎
@garymarcus Gary Marcus on x
Grok 3 hot take: 1. @Sama can breathe easy for now. 2. No game changers; no major leap forward, here. Hallucinations haven't been magically solved, etc. 3. That said, OpenAI's moat keeps diminishing, so price wars will continue and profits will continue to be elusive for [image]
@deedydas Deedy on x
xAI's new large language model Grok 3 is out! Comes with a reasoning and a mini model. 1400 ELO score on LMArena, #1 AIME 24 — 52% [96% with reasoning!] GPQA —75% [85%] Coding (LiveCodeBench) — 57% [80%] And 93% on the fresh math competition AIME 2025 where it beats [image]
@elonmusk Elon Musk on x
Archangel-12
@emollick Ethan Mollick on x
Based on the announcement (& not using the model, yet): 1) X has caught up with the frontier of released models VERY quickly, if they continue to scale this fast, they are a major player 2) Grok 3 is closely following the OpenAI playbook 3) Not sure who will use API at this point
@belce_dogru @belce_dogru on x
The @xai team worked incredibly hard to launch Grok 3. It is coming to the API very soon!
@emollick Ethan Mollick on x
Grok 3 [video]
@elonmusk Elon Musk on x
And Grok 3 coming soon
@elonmusk Elon Musk on x
It's a start
@levie Aaron Levie on x
Playing with Grok with Box AI in our dev environment. Once Grok 3 is available through the API we'll start to make Grok available through Box's AI Studio. [video]
@patrickmoorhead Patrick Moorhead on x
Now with o3 versus Grok 3. Still a really good showing but not reasoning performance dominance. Many other variables to consider, of course. Impressive.
r/singularity r on reddit
First Grok 3 Benchmarks
r/Bard r on reddit
GROK 3 just launched. — Grok 3 just launched. Here are the Benchmarks.Your thoughts?
r/LocalLLaMA r on reddit
GROK-3 (SOTA) and GROK-3 mini both top O3-mini high and Deepseek R1
r/singularity r on reddit
xAI's Grok 3 launch livestream
r/singularity r on reddit
Grok 3 Reasoning Benchmarks
r/SpaceXMasterrace r on reddit
“If all goes well, within two years SpaceX will send rockets to Mars with Optimus robots and Grok.”
@lmarena_ai @lmarena_ai on x
Here you can see @xai Grok-3's performance across all the top categories: 🔹 Overall w/ Style Control 🔹 Hard Prompts & Hard Prompt w/ Style Control 🔹 Coding 🔹 Math 🔹 Creative Writing 🔹 Instruction Following 🔹 Longer Query 🔹 Multi-Turn [image]
@nearcyan Near on x
broadly i think using a model for 30 minutes is still much nicer than almost any benchmark
@pratikkadam_ Pratik Kadam on x
From @karpathy thoughts, I think Grok-3 has reach a level of Gemini 2.0 Flash Thinking and some what touched o3-mini high. Must read to understand where Grok-3 stands in AI Race.
@nearcyan Near on x
@karpathy @IndraVahan claude can be very funny if he has an absurd amount of context, but the hit rate is still often mediocre (1/2 to 1/10 depending on topic, but can produce bangers)
@justasger @justasger on x
Obvious ChatGPT 4 victory. Grok 3 drawings looks like clear cortex failure. Try again EL. [image]
@jiquanngiam Jiquan Ngiam on x
Nice vibe tests from @karpathy looking forward to testing out Grok 3 in our prod systems.
@chrisfirsttt @chrisfirsttt on x
Andrej Karpathy, a co-founder of OpenAI, speaks positively about @xAI's Grok 3. “[Grok 3 is] somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking.”
@theheroshep @theheroshep on x
Best Grok 3 breakdown: - Grok 3 shows sota thinking capabilities, comparable to OpenAI's $200 o1-pro - @xai's rapid progress to sota territory in ~1 year is unprecedented - emphasizes we need more ways to test Grok (and all other models!)
@toptickcrypto @toptickcrypto on x
Most comprehensive review of grok 3 I've seen so far. Not quite o3 pro deep research but essentially state of the art in many other aspects. Impressive given their late start and will be interesting to monitor their progress as they scale compute even more.
@emollick Ethan Mollick on x
I did not get early access, but based on a half-dozen queries this seems right. A very good model that is now at the frontier, but not something that would make you switch from another AI yet The key thing to pay attention to is that X got here very fast & whether that continues
@liangchenluo Liangchen Luo on x
The thinking model is truly impressive tbh, and its ability is not even fully unlocked yet. Numbers kept increasing.. really cool stuff!
@garymarcus Gary Marcus on x
so, @karpathy got a chance to dive deeper that I did not .. but his take fits quite with mine. Grok 3 is a contender, but not AGI, and not light years ahead of o3:
@karpathy Andrej Karpathy on x
@IndraVahan Great question right? I'd love to know, I don't think I fully understand this either. But considering that noone has (to my knowledge) figured out a way to post-train an LLM to be funny, I am prepared to believe humor is really difficult and requires more underlying c…
@benjamindekr Benjamin De Kraker on x
Karpathy on Grok 3: “The impression overall I got here is that this is somewhere around (OpenAI) o1-pro capability, and ahead of DeepSeek-R1, though of course we need actual, real evaluations to look at.” Sooooo, I was forced to resign for saying basically the exact same
@yacinemtb Kache on x
Honestly it's pretty insane that they spawned a 200k h100 data center out of thin air, then leapfrogged the state of the art. In under a YEAR insane infrastructure build out speed. How do you even compete with that?
@victuxbb Victor Caldentey on x
Nice reality check for Grok 3 - Grok 3 failed to solve the “Emoji mystery” question, even with hints provided in Rust code. - It was unable to generate coherent “tricky” tic tac toe boards, producing nonsensical results. - Grok 3 struggled with the task of estimating training
@emollick Ethan Mollick on x
Sadly it also messes up the envoi in a sestina (common for non-reasoners) but otherwise the sestina is pretty darn good. [image]
@_colemurray Cole Murray on x
They're saying the model is good. Release gpt 4.5 [image]
r/singularity r on reddit
Andrej Karpathy post vibe check: “Grok 3 Thinking around o1-pro level and better than R1/Gemini Flash Thinking”
X Help Center X Help Center on x
About X Premium — X Premium is our premium subscription service that elevates quality conversations on the platform.
r/Twitter r on reddit
X doubles its Premium+ plan prices after xAI releases Grok 3 | TechCrunch
@teknium1 @teknium1 on x
In my testing it was at least as good in thinking mode then o3-full deep research was, despite that not being listed here - Interesting to note that grok-3mini seems generally better than full, my guess is that this means they didnt distill full into mini like I assume OpenAI [im…
@512x512 Yaroslav on x
Also was fortunate to build a DeepSearch UX for Grok. Hope you like it! [image]
@nickadobos Nick Dobos on x
Grok DeepSearch and OpenAI deep research are hilarious plays to SEO deepseek into oblivion [image]
@chrisprucha Chris Průcha on x
Grok 3 + DeepSearch is far superior to o3-mini + search for everyday tasks
@bantg @bantg on x
grok 3 rolled out to x users, but it seems different from what they've demoed. deepsearch is just one search + inference and it downgraded so it can't even search x posts. [image]
@deryatr_ Derya Unutmaz on x
I'm most excited about Grok 3 DeepSearch! As soon as it's available to me, I'll compare it with OpenAI Deep Research, especially for biomedical research such as in cancer, aging and healthcare questions. Maybe they can even collaborate to solve the most important problems! ☺️ [vi…
@omarsar0 Elvis on x
DeepSearch also exposes the steps that it takes to conduct the search itself. [image]
@ehuanglu @ehuanglu on x
Grok just launched DeepSearch! Now you can: ⁃Conduct in-depth research ⁃Brainstorm ideas ⁃Analyze data ⁃Generate images ⁃Write and debug code [video]
@bindureddy Bindu Reddy on x
Grok-3 reasoning is not released yet The version that is released wasn't doing so well on their three self-reported benchmarks Technically there is nothing to evaluate or test yet! So will just have to wait 🤷‍♀️
@emollick Ethan Mollick on x
Another thing Grok 3 highlights is the urgent need for better batteries of tests and independent testing authorities. Public benchmarks are both “meh” and saturated, leaving a lot of AI testing to be like food reviews, based on taste. If AI is critical to to work, we need more.
@theo @theo on x
Grok 3 is here and it fails the “hexagon ball bouncing” test spectacularly [video]
@saranormous @saranormous on x
Is ~log(15)x improvement in these benchmarks worth it for ~15x cluster scaling? In the end users will decide— by voting with their feet, and the markets— by voting to give Grok dollars It's clearly a SOTA model, and folks who threw shade at Grok research team were mistaken [image…
@levie Aaron Levie on x
Grok 3 benchmarks show the jump in AI capability you get when it spends more time on a task. In the future, you will be able to solve any given problem in the world just by throwing more compute at it. [video]

Chronicles

xAI launches Grok-3 beta and Grok-3 mini, its latest AI models with reasoning, trained on 200K GPUs, or “10x” more compute than Grok-2, for X Premium+ users

Related Coverage

Discussion