Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6
Michael Nuñez /VentureBeat:NEW
VentureBeat Michael Nuñez
Related Coverage
- Anthropic touts AI cybersecurity project with Big Tech partners Reuters
- Anthropic Signs Google, Broadcom Deal to Add Multi-Gigawatt TPU Capacity HPCwire · Andrew Jolly
- Anthropic Signs Mega Deal With Google, Broadcom: Why Bitcoin Miners Should Pay Attention International Business Times · Niloy Chakrabarti
- Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing Business Insider · Brent D. Griffiths
- CrowdStrike, Palo Alto Networks shares pop as cybersecurity bulls finally get some AI validation Dow Jones Newswires · Emily Bary
Discussion
-
@apompliano
Anthony Pompliano
on x
AI is coming for a lot of jobs. Just look at these performance metrics from Anthropic's latest model. Superhuman intelligence is going to be available to anyone. [image]
-
@deedydas
Deedy
on x
Claude Mythos just obliterated every single benchmark in AI. I can't believe what I'm reading. [image]
-
@fabknowledge
@fabknowledge
on x
wow this is the biggest step change in a new model release in recent memory [image]
-
@fabknowledge
@fabknowledge
on x
Mythos able to exploit like firefox pretty easily. Cybench is 100% at 1 pass which is lol [image]
-
@neilhtennek
Kenneth
on x
I cannot celebrate Mythos, it brings a sense of dread I do not particularly understand. 93.9% SWE-Bench. [image]
-
@kimmonismus
@kimmonismus
on x
MYTHOS BENCHMARKS, OFFICIAL. HOLY MOLY Anthropic cooked!! [image]
-
@yuchenj_uw
Yuchen Jin
on x
After seeing the Mythos benchmark scores, my Claude Opus 4.6 already feels outdated. Anthropic, can you just drop Mythos? I know you can't do it due to some “safety” reasons, but I'd happily pay $2,000/month to use it. AGI is already here - it's just not evenly distributed.
-
@yuchenj_uw
Yuchen Jin
on x
Anthropic is truly unstoppable. Mythos is crushing Claude Opus 4.6 across every serious agentic coding benchmark. It has found vulnerabilities in the Linux kernel, a 27-year-old vulnerability in OpenBSD, and a 16-year-old vulnerability in FFmpeg. No wonder folks at big labs [imag…
-
r/technology
r
on reddit
Anthropic says its most powerful AI cyber model is too dangerous to release publicly — so it built Project Glasswing