/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Matt Shumer

@mattshumer_
68 posts
2026-03-06
I've been testing GPT-5.4 for the last week.  In short, it is the best model in the world, by far.  It's so good that it's the first model that makes the “which model should I use?” conversation feel almost over.  The biggest surprise: I barely use Pro anymore! ...  For the first time, 5.4's standard version, with heavy thinking, just broke that habit.  Even in standard mode, GPT-5.4 is better than previous models in Pro mode... crazy!
2026-03-06 View on X
The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

2026-03-05
I've been testing GPT-5.4 for the last week. In short, it is the best model in the world, by far. It's so good that it's the first model that makes the “which model should I use?” conversation feel almost over. The biggest surprise: I barely use Pro anymore! If you know me, [image]
2026-03-05 View on X
The Verge

OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities

The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

2026-02-11
Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...
2026-02-11 View on X
The Decoder

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, an option to focus research on specific websites, and search interruption

The feature now runs on the new GPT-5.2 model, as OpenAI announced on X. A key addition is that users can connect apps to ChatGPT and—potentially very useful—search specific websit...

Something Big Is Happening
2026-02-11 View on X
The Decoder

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, an option to focus research on specific websites, and search interruption

The feature now runs on the new GPT-5.2 model, as OpenAI announced on X. A key addition is that users can connect apps to ChatGPT and—potentially very useful—search specific websit...

Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...
2026-02-11 View on X
Sources

Q&A with Fidji Simo on ChatGPT ads, OpenAI's efforts to ship a new model soon to end Sam Altman's Code Red, Anthropic's Super Bowl ads, Sora, Codex, and more

How ads in ChatGPT will work, what will end the Code Red, those Anthropic attack ads, working with Sam Altman, and much more...

Every time someone asks me what's going on with AI, I give them the safe answer. Because the real one sounds insane. I'm done holding back. I wrote what I wish I could sit down and tell everyone I care about. Send it to someone who needs to read it. https://x.com/...
2026-02-11 View on X
Matt Shumer

GPT-5.3-Codex and Claude Opus 4.6 can meaningfully contribute to the improvement of AI models, a sign of what's coming for most knowledge work within five years

Think back to February 2020.  —  If you were paying close attention, you might have noticed a few people talking about a virus spreading overseas.

Something Big Is Happening
2026-02-11 View on X
Matt Shumer

GPT-5.3-Codex and Claude Opus 4.6 can meaningfully contribute to the improvement of AI models, a sign of what's coming for most knowledge work within five years

Think back to February 2020.  —  If you were paying close attention, you might have noticed a few people talking about a virus spreading overseas.

Something Big Is Happening
2026-02-11 View on X
Sources

Q&A with Fidji Simo on ChatGPT ads, OpenAI's efforts to ship a new model soon to end Sam Altman's Code Red, Anthropic's Super Bowl ads, Sora, Codex, and more

How ads in ChatGPT will work, what will end the Code Red, those Anthropic attack ads, working with Sam Altman, and much more...

2026-01-31
Uhhh... I don't think Genie 3 is supposed to be generating Fortnite gameplay [video]
2026-01-31 View on X
Reuters

Shares of gaming companies plunge after Google released Project Genie, which lets users create interactive worlds; Unity closed down 24.2% and Take-Two 7.9%

Shares of videogame companies fell sharply in afternoon trading on Friday after Alphabet's Google (GOOGL.O) rolled …

2026-01-30
Uhhh... I don't think Genie 3 is supposed to be generating Fortnite gameplay [video]
2026-01-30 View on X
Reuters

Shares of gaming companies plunge after Google released Project Genie, which lets users create interactive worlds; Unity closed down 24.2% and Take-Two 7.9%

Shares of videogame companies fell sharply in afternoon trading on Friday after Alphabet's Google (GOOGL.O) rolled …

2025-12-12
I've been using GPT-5.2 Pro for two weeks now. It's the best model in the world. It thinks for over an hour on hard problems. And it nails tasks no other model can touch. I can't live without it. Here's my GPT-5.2 Pro deep dive: https://shumer.dev/...
2025-12-12 View on X
OpenAI

OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost

OpenAI eyes January exit from “code red” John Werner / Forbes : The Wonder And The Promise Of GPT 5.2 Is Here Benj Edwards / Ars Technica : OpenAI releases GPT-5.2 after “code red”...

2025-11-25
Claude Opus 4.5 looks really great (by the numbers, at least) I don't get early access to @AnthropicAI models unfortunately, so I don't have a review to share today, but I'll absolutely be testing it and sharing my findings in the coming days! [image]
2025-11-25 View on X
Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today.  It's intelligent, efficient …

We need a new way to express AI costs... $/token doesn't make much sense anymore. Maybe a benchmark that tries to give a sense of the cost to run an average workload?
2025-11-25 View on X
Simon Willison's Weblog

Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro

Opus 4.5 was responsible for most of the work across 20 commits, 39 files changed, 2,022 additions and 1,173 deletions in a two day period. … Forums: r/BetterOffline : Claude Opus ...

We need a new way to express AI costs... $/token doesn't make much sense anymore. Maybe a benchmark that tries to give a sense of the cost to run an average workload?
2025-11-25 View on X
Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today.  It's intelligent, efficient …

2025-11-19
I've had access to Gemini 3 since November 13th. Since then, I've used it as my daily-driver, pushing it to its limits. Here's my review of Gemini 3: https://shumer.dev/...
2025-11-19 View on X
matt shumer

Gemini 3 hands-on: a fundamental improvement on daily use, extremely fast, Antigravity IDE is a powerful launch product, and its personality is terse and direct

Gemini 3 is a fundamental improvement on daily use, not just on benchmarks.  It feels more consistent and less “spiky” than previous models.

2025-11-18
I've had access to Gemini 3 since November 13th. Since then, I've used it as my daily-driver, pushing it to its limits. Here's my review of Gemini 3: https://shumer.dev/...
2025-11-18 View on X
The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

Yeah, AI progress is totally, definitely stalling... Look at MathArena Apex. GPT-5.1 scored 1%. Gemini 3 scored 23%. That is a >20x jump on one of the hardest reasoning tasks we have. But sure, keep your head in the sand... [image]
2025-11-18 View on X
The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

The intelligence of Gemini 3 Deep Think looks to be off-the-charts [image]
2025-11-18 View on X
The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

Yeah, AI progress is totally, definitely stalling... Look at MathArena Apex. GPT-5.1 scored 1%. Gemini 3 scored 23%. That is a >20x jump on one of the hardest reasoning tasks we have. But sure, keep your head in the sand... [image]
2025-11-18 View on X
9to5Google

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond

Google today announced Gemini 3 with the goal of bringing “any idea to life.”  The first model available in this family …

2025-11-13
I've been testing GPT-5.1 for a few days. My quick notes: - creative writing style is a LOT better - it's much faster than GPT-5 (with similar intelligence) for most prompts - the personality is WAY better (but can still sometimes be annoying) - it's great in Codex!
2025-11-13 View on X
OpenAI

OpenAI rolls out GPT-5.1 Instant, which is “warmer by default”, and GPT-5.1 Thinking, which is “easier to understand and faster”, starting with paid subscribers

We're upgrading GPT‑5 while making it easier to customize ChatGPT.  Starting to roll out today to everyone, beginning with paid users.