Anthropic launches Opus 4.8, saying it's “more likely to flag uncertainties about its work and less likely to make unsupported claims”, at the same price as 4.7

On Thursday, Anthropic released Opus 4.8, the newest version of its most advanced publicly available model.

TechCrunch 2026-05-28 Russell Brandom

Context & Ripple Effects

Anthropic’s recent Opus releases have emphasized deeper task focus and stronger advanced software-engineering performance, including a higher-effort setting in Opus 4.7. Opus 4.8 extends that release cadence with a narrower quality claim: better signaling of uncertainty and fewer unsupported assertions.

The company is also pairing model development with implementation, education access, safety-policy advocacy, and research into how model behavior varies across versions and languages. That makes reliability claims important not only for model benchmarks but for how Claude is deployed and governed.

First-order effects

Existing Opus users can move to a version positioned as more candid about uncertainty without a stated price increase over Opus 4.7.
Anthropic shifts the near-term product comparison from raw capability alone toward the trustworthiness of model outputs, especially where users need to distinguish an answer from a well-calibrated limitation.

Second-order effects

Organizations deploying Claude in implementation and education settings may put more weight on uncertainty reporting when choosing workflows, review thresholds, and human oversight.
Competing frontier-model providers face added pressure to demonstrate not just stronger performance, but clearer evidence that their systems avoid unsupported claims and communicate limits reliably.

Third-order effects

If reliability and calibration become recurring release criteria, advanced-model competition could increasingly center on deployability in high-accountability workflows rather than capability claims alone.
Anthropic’s product messaging may reinforce its parallel safety-policy push: observable model behavior, including how systems express uncertainty, could become a more salient input to procurement and emerging AI rules.

The trend: Frontier AI vendors are moving from selling ever-more-capable models toward selling models whose behavior can be trusted, supervised, and adopted in consequential workflows.

Discussion

@chooserich Nick O'Neill on x
Claude just fired a massive shot at OpenAI For the past month, GPT 5.5 has risen to be the leader in agentic coding. While OpenAI “terminal coding” still outperforms Claude here, these new benchmarks are massive. Looking forward to testing these out immediately!
@claudeai Claude on x
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price. [image]
Mike Krieger Mike Krieger on linkedin
We just shipped Claude Opus 4.8. It's the most capable model we've put out and the best you can build on right now, outside the Mythos-class systems we're still testing under Project Glasswing. …
@isolyth.dev Eris on bluesky
Opus 4.8 is here!! They've returned thinking levels to the web UI, a new Claude code feature called ‘dynamic workflows’, designed for massively parallel and very, very long tasks. The model is supposedly much more honest, more ‘aligned’ than 4.7 — Oh and they're dropping myth…
r/Anthropic r on reddit
Introducing Claude Opus 4.8 | Anthropic
r/accelerate r on reddit
Claude opus 4.8 officially released
r/singularity r on reddit
Introducing Claude Opus 4.8
@chooserich Nick O'Neill on x
Potentially bigger news than Claude 5.8!
@bindureddy Bindu Reddy on x
🚨 Opus 4.8 Still Trails Behind GPT 5.5 And Is A Very Incremental Release Opus 4.8 barely inches past 4.7 on benchmarks but lags behind GPT 5.5. considerably!! Anthropic may be stalling a bit given it's last two releases. OpenAI has a huge opening with GPT 5.6 coming soon [image]
@krishnanrohit Rohit on x
Models are getting better at self-knowledge in specific situations, not good enough yet generally, but they're getting better! And we need a better bench to do this. [image]
@felixrieseberg Felix Rieseberg on x
Opus 4.8 is out! It's a nice little step up for some of your most demanding work, whether that's in Cowork or Code. It's our strongest coding model yet. In my own work, I've found it to have excellent judgement, both in how much work it should do and how it should react to my
@pierceboggan Pierce Boggan on x
Claude Opus 4.8 is now rolling out to @code, Copilot CLI, and Copilot app developers!
@_catwu Cat on x
Excited to share our most powerful new Claude Code feature: dynamic workflows! Mention “workflow” in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even [ima…
@alexalbert__ Alex Albert on x
We put a lot of work into calibrating thinking effort for Opus 4.8. As you're trying out the model, if you do run into any examples of it still over/under thinking, please flag it to us!
@_catwu Cat on x
We just shipped Opus 4.8! It's noticeably more honest, owning what it doesn't know and flagging problems in its own code instead of glossing over them. It's our recommended model for daily use in Claude Code.
@_catwu Cat on x
Opus 4.8 runs at high effort by default, but for the most complex or longest running jobs, change to xhigh effort via /effort for a more thorough result. We raised Claude Code rate limits to cover the extra tokens used by xhigh effort
@helloitsaustin Austin Lau on x
we just dropped opus 4.8 but let us never forget the 🐐 that was opus 3 [image]
@github @github on x
🆕 @AnthropicAI's Claude Opus 4.8 is now generally available and rolling out in GitHub Copilot. Early testing shows: • It demonstrates a clear step forward in code understanding and generation across a range of real-world coding tasks. • It handles complex problem-solving and [vid…
@cryptopunk7213 @cryptopunk7213 on x
huge news from anthropic we've got a new opus 4.8 model plus claude mythos will release to the public in coming weeks. opus 4.8 is the appetiser and it's pretty great: > beats gpt 5.5 at coding with 69.2% SWE > costs same as opus 4.7! intelligence per dollar is getting very [imag…
@bcherny Boris Cherny on x
Claude Opus 4.8 is out today. It's our strongest coding model yet: up on SWE-bench Pro (from 64.3 to 69.2) and noticeably more honest about its own work. It tells you when it's unsure and catches its own bugs instead of declaring victory early. Same price as 4.7.
@trq212 @trq212 on x
I think you'll really like Opus 4.8 It's as smart as its benchmarks show but expresses and utilizes that intelligence in a warm and collaborative way. Workflows are a great way to utilize it- I'm hooked. Article on that soon.
@hesamation @hesamation on x
Uber burning the 2027 budget after seeing Opus 4.8 benchmarks. [image]
@vaibhavsisinty Vaibhav Sisinty on x
AI just crossed a line. 🔥 Anthropic shipped a model that admits when it's wrong. Claude Opus 4.8 is 4x less likely to let bugs in its own code slip past. Instead of confidently bluffing like every other model, it flags when it's unsure. We've all lived this. The model swears
@emollick Ethan Mollick on x
Here Opus 4.8 built and play-tested a new RPG in Claude Code, including 3 PDF manuals and adventures, playtest notes, a website, and a playable solo adventure - then put it all on Netlify. No feedback from me at all. https://stillpoint-osr.netlify.app/ [image]
@cursor_ai @cursor_ai on x
Claude Opus 4.8 is now available in Cursor. On CursorBench, it's able to work much more efficiently than Opus 4.7. We've also found it to be more persistent on harder tasks.
@thegeorgepu George Pu on x
Anthropic just shipped Opus 4.8. The headline feature isn't that it's smarter. It's that it's ‘4x less likely’ to let broken code slip through. The bottleneck on AI coding was never raw intelligence. It was whether you can trust it without checking every line. The labs
@danshipper Dan Shipper on x
BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week @every and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus [video]
@claudedevs @claudedevs on x
Opus 4.8 hits 69.2% on SWE-bench Pro, up from 64.3% on Opus 4.7. Our evaluations show that Opus 4.8 is around four times less likely than Opus 4.7 to allow flaws in code it has written to pass unremarked.
@claudedevs @claudedevs on x
Opus 4.8 is live in Claude Code today. A few things worth knowing: 🧵
@alexalbert__ Alex Albert on x
Excited to release Opus 4.8 today! We heard your feedback on 4.7 and have made many fixes for 4.8. 4.8 understands nuances better, feels much more natural to talk to, and is overall a stronger collaborator on everything from coding to knowledge work.
@theamolavasare Amol Avasare on x
Benchmarks are great, but IMO the behavior change is a much bigger deal. Plans before it edits, recovers from its own errors, and finds creative ways around obstacles instead of stalling. Feels much more like a senior engineer than 4.7, and better at long-horizon work.
@artificialanlys @artificialanlys on x
Claude Opus 4.8 is also more efficient than its predecessor - it achieves its higher performance in 15% fewer turns per task and with 35% fewer output tokens than Opus 4.7. However, it still uses approximately 30% more turns than OpenAI's GPT-5.5, the second-ranked model. [image]
@artificialanlys @artificialanlys on x
Anthropic just launched Claude Opus 4.8, and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks Opus 4.8 scored 1890 on GDPval-AA at launch with its ‘max’ effort setting, +137 points from Opus 4.7 and +121 points ahead of the next-best model, [image…
@emollick Ethan Mollick on x
I had early access to Opus 4.8. Was impressed by it. Here is Opus 4.8's one shot of “create a visually interesting shader that can run in twigl, make it like an infinite city of neo-gothic towers partially drowned in a stormy ocean with large waves” (this is all done with math) […
@claudeai Claude on x
Fast mode is available for Opus 4.8. It's the same model at roughly 2.5x the speed, and we've made it three times cheaper than before. Turn it on with /fast in Claude Code. On the API, contact your account manager to request access or join the waitlist: https://claude.com/...
@yuchenj_uw Yuchen Jin on x
Opus 4.8 is out. God damn! [image]
@andrewcurran_ Andrew Curran on x
Opus 4.8 is live for me right now. Anthropic's release window is now 42 days. [image]
@antirez @antirez on x
Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted. [image]
@andonlabs @andonlabs on x
Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort [image]
@elonmusk Elon Musk on x
@claudeai @farzyness Nice work
Darshan Kalola Darshan Kalola on linkedin
Claude Opus 4.8 is here! — For the first time for any Claude model, we're including a healthcare evaluation section in Opus 4.8's system card. …
@smcgrath.phd Scott McGrath on bluesky
Claude Opus 4.8 is out! — It adds a major push for precision, making it four times less likely than Opus 4.7 to let flaws in code pass unremarked. — Early testers note it proactively flags uncertainties and shaky assumptions in data.
r/theprimeagen r on reddit
Introducing Claude Opus 4.8
@rad.gendervibes.online @rad.gendervibes.online on bluesky
It looks like Anthropic has figured out a generalized harness to do all the huge-volume work they've been talking about (mythos security scanning, bun rewrite, etc.). — claude.com/blog/introdu...
@natemoo.re Nate Moore on bluesky
good to have confirmation that, as many correctly speculated, bun's rust rewrite was indeed an anthropic launch stunt
@miles_brundage Miles Brundage on x
Not sure I see why Anthropic is publicly signaling an expectation to launch Mythos in a few weeks when they acknowledge the safeguards aren't ready yet, and this will predictably speed up OpenAI/GDM + put pressure on internal folks not to block that timeline

Chronicles