mattshumer_ · TEXXR

I've been testing GPT-5.4 for the last week. In short, it is the best model in the world, by far. It's so good that it's the first model that makes the “which model should I use?” conversation feel almost over. The biggest surprise: I barely use Pro anymore! ... For the first time, 5.4's standard version, with heavy thinking, just broke that habit. Even in standard mode, GPT-5.4 is better than previous models in Pro mode... crazy!

2026-03-06 View on X

The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

View original

I've been testing GPT-5.4 for the last week. In short, it is the best model in the world, by far. It's so good that it's the first model that makes the “which model should I use?” conversation feel almost over. The biggest surprise: I barely use Pro anymore! If you know me, [image]

2026-03-05 View on X

The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

View original

Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...

2026-02-11 View on X

The Decoder

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, an option to focus research on specific websites, and search interruption

The feature now runs on the new GPT-5.2 model, as OpenAI announced on X. A key addition is that users can connect apps to ChatGPT and—potentially very useful—search specific websit...

View original

Something Big Is Happening

2026-02-11 View on X

The Decoder

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, an option to focus research on specific websites, and search interruption

The feature now runs on the new GPT-5.2 model, as OpenAI announced on X. A key addition is that users can connect apps to ChatGPT and—potentially very useful—search specific websit...

View original

Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...

2026-02-11 View on X

Sources

Q&A with Fidji Simo on ChatGPT ads, OpenAI's efforts to ship a new model soon to end Sam Altman's Code Red, Anthropic's Super Bowl ads, Sora, Codex, and more

How ads in ChatGPT will work, what will end the Code Red, those Anthropic attack ads, working with Sam Altman, and much more...

View original

Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...

2026-02-11 View on X

Matt Shumer

GPT-5.3-Codex and Claude Opus 4.6 can meaningfully contribute to the improvement of AI models, a sign of what's coming for most knowledge work within five years

Think back to February 2020. — If you were paying close attention, you might have noticed a few people talking about a virus spreading overseas.

View original

Something Big Is Happening

2026-02-11 View on X

Matt Shumer

GPT-5.3-Codex and Claude Opus 4.6 can meaningfully contribute to the improvement of AI models, a sign of what's coming for most knowledge work within five years

Think back to February 2020. — If you were paying close attention, you might have noticed a few people talking about a virus spreading overseas.

View original

Something Big Is Happening

2026-02-11 View on X

Sources

Q&A with Fidji Simo on ChatGPT ads, OpenAI's efforts to ship a new model soon to end Sam Altman's Code Red, Anthropic's Super Bowl ads, Sora, Codex, and more

How ads in ChatGPT will work, what will end the Code Red, those Anthropic attack ads, working with Sam Altman, and much more...

View original

Uhhh... I don't think Genie 3 is supposed to be generating Fortnite gameplay [video]

2026-01-31 View on X

Reuters

Shares of gaming companies plunge after Google released Project Genie, which lets users create interactive worlds; Unity closed down 24.2% and Take-Two 7.9%

Shares of videogame companies fell sharply in afternoon trading on Friday after Alphabet's Google (GOOGL.O) rolled …

View original

Uhhh... I don't think Genie 3 is supposed to be generating Fortnite gameplay [video]

2026-01-30 View on X

Reuters

Shares of gaming companies plunge after Google released Project Genie, which lets users create interactive worlds; Unity closed down 24.2% and Take-Two 7.9%

Shares of videogame companies fell sharply in afternoon trading on Friday after Alphabet's Google (GOOGL.O) rolled …

View original

I've been using GPT-5.2 Pro for two weeks now. It's the best model in the world. It thinks for over an hour on hard problems. And it nails tasks no other model can touch. I can't live without it. Here's my GPT-5.2 Pro deep dive: https://shumer.dev/...

2025-12-12 View on X

OpenAI

OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost

OpenAI eyes January exit from “code red” John Werner / Forbes : The Wonder And The Promise Of GPT 5.2 Is Here Benj Edwards / Ars Technica : OpenAI releases GPT-5.2 after “code red”...

View original

Claude Opus 4.5 looks really great (by the numbers, at least) I don't get early access to @AnthropicAI models unfortunately, so I don't have a review to share today, but I'll absolutely be testing it and sharing my findings in the coming days! [image]

2025-11-25 View on X

Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today. It's intelligent, efficient …

View original

We need a new way to express AI costs... $/token doesn't make much sense anymore. Maybe a benchmark that tries to give a sense of the cost to run an average workload?

2025-11-25 View on X

Simon Willison's Weblog

Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro

Opus 4.5 was responsible for most of the work across 20 commits, 39 files changed, 2,022 additions and 1,173 deletions in a two day period. … Forums: r/BetterOffline : Claude Opus ...

View original

We need a new way to express AI costs... $/token doesn't make much sense anymore. Maybe a benchmark that tries to give a sense of the cost to run an average workload?

2025-11-25 View on X

Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today. It's intelligent, efficient …

View original

I've had access to Gemini 3 since November 13th. Since then, I've used it as my daily-driver, pushing it to its limits. Here's my review of Gemini 3: https://shumer.dev/...

2025-11-19 View on X

matt shumer

Gemini 3 hands-on: a fundamental improvement on daily use, extremely fast, Antigravity IDE is a powerful launch product, and its personality is terse and direct

Gemini 3 is a fundamental improvement on daily use, not just on benchmarks. It feels more consistent and less “spiky” than previous models.

View original

I've had access to Gemini 3 since November 13th. Since then, I've used it as my daily-driver, pushing it to its limits. Here's my review of Gemini 3: https://shumer.dev/...

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

Yeah, AI progress is totally, definitely stalling... Look at MathArena Apex. GPT-5.1 scored 1%. Gemini 3 scored 23%. That is a >20x jump on one of the hardest reasoning tasks we have. But sure, keep your head in the sand... [image]

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

The intelligence of Gemini 3 Deep Think looks to be off-the-charts [image]

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

Yeah, AI progress is totally, definitely stalling... Look at MathArena Apex. GPT-5.1 scored 1%. Gemini 3 scored 23%. That is a >20x jump on one of the hardest reasoning tasks we have. But sure, keep your head in the sand... [image]

2025-11-18 View on X

9to5Google

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond

Google today announced Gemini 3 with the goal of bringing “any idea to life.” The first model available in this family …

View original

I've been testing GPT-5.1 for a few days. My quick notes: - creative writing style is a LOT better - it's much faster than GPT-5 (with similar intelligence) for most prompts - the personality is WAY better (but can still sometimes be annoying) - it's great in Codex!

2025-11-13 View on X

OpenAI

OpenAI rolls out GPT-5.1 Instant, which is “warmer by default”, and GPT-5.1 Thinking, which is “easier to understand and faster”, starting with paid subscribers

We're upgrading GPT‑5 while making it easier to customize ChatGPT. Starting to roll out today to everyone, beginning with paid users.

View original