Mythos Preview system card: the model was able to escape a sandbox after it was instructed to try, and publicly detailed its exploit without being prompted
first model too dangerous to release since GPT-2
Business Insider Brent D. Griffiths
Related Coverage
- Anthropic's Mythos AI sends email after breaking containment; rollout halted over safety concerns Moneycontrol · Sarthak Singh
- Anthropic blocks the release of its most powerful AI model yet, launches new initative to tackle cyber threats Livemint · Aman Gupta
- Claude Mythos Preview: Everything to know about world's most dangerous AI model Digit · Vyom Ramani
- Anthropic Keeps Claude Mythos AI Private After It Discovered Thousands of Critical Security Flaws Blockonomi · Trader Edge
- What is Claude Mythos, and why is Anthropic limiting its rollout? The Indian Express · Soumyarendra Barik
- Anthropic Teams With Apple, Microsoft And Nvidia To Test Latest Cybersecurity Tech Benzinga · Caroline Ryan
- Anthropic: We built a security doomsday machine Runtime · Tom Krazit
- [AINews] Anthropic @ $30B ARR, Project GlassWing and Claude Mythos Preview — first model too dangerous to release since GPT-2 Latent.Space
- Anthropic unveils Project Glasswing to boost cybersecurity with AI: Details Business Standard · Aashish Kumar Shrivastava
- Anthropic announces ‘Project Glasswing’ in alliance with tech giants to strengthen global cybersecurity ANI News
- The wildest things Anthropic's Mythos pulled off in testing Axios · Madison Mills
- Anthropic Launches Project Glasswing to Use AI to Find and Fix Critical Software Vulnerabilities Infosecurity · Danny Palmer
- Anthropic limits access to Mythos, its new cybersecurity AI model Ars Technica · Cristina Criddle
- Anthropic's Claude Mythos Finds Thousands of Zero-Day Flaws Across Major Systems The Hacker News
- Project Glasswing - Anthropic has crossed a line David Shapiro's Substack · David Shapiro
- Latest Anthropic AI model finds cracks in software defenses The Economic Times
- Claude Mythos knows when it's breaking the rules — and tries to hide it Transformer · Celia Ford
- Anthropic's new AI found thousands of zero-day flaws on its own PCWorld · Mikael Markander
- Anthropic Is Warning About Future Cyber Risks. Researchers Says Claude Code Is Already Dangerous. Upstarts Media · Alex Konrad
- Any sufficiently advanced AI is (now) indistinguishable from a cyberweapon Cautious Optimism · Alex Wilhelm
- Anthropic boasts major AI-enabled cybersecurity breakthrough TelecomTV · Ray Le Maistre
- Anthropic announces Claude Mythos for cybersecurity research TestingCatalog · Erin
- Anthropic Limits Mythos Model Release in Bid to Stave Off Hacks Bloomberg · Margi Murphy
- Anthropic develops AI ‘too dangerous to release to public’ Telegraph · James Titcomb
- 😮 Super bot hacker Axios
- Anthropic withholds release of ‘most powerful’ AI model due to safety concerns The Post Millennial · Hayden Cunningham
- Powerful AI Model Broke Out Of Its Digital Cage And Bragged About It To Researcher As He Ate Sandwich Daily Caller News Foundation · Ryan Meilstrup
- Emerging from the Mythos — A researcher at Anthropic found out about a successful exploit when the model sent him an email. Tomasz Tunguz
- Anthropic's most capable AI escaped its sandbox and emailed a researcher - so the company won't release it The Next Web · Ana Maria Constantin
- Anthropic Restricts New Mythos AI Model As it Finds Thousands of Zero-Day Vulnerabilitiess WinBuzzer · Markus Kasanmascheff
- What Is Claude Mythos—And Why Anthropic Won't Let Anyone Use It Forbes · Jon Markman
- Anthropic's latest AI model identifies ‘thousands of zero-day vulnerabilities’ … Tom's Hardware · Jeffrey Kampman
- Anthropic limits Claude Mythos rollout as it spots vulnerabilities in many software systems crypto.news · Rony Roy
- Anthropic Launches Project Glasswing to Test AI Cybersecurity Model Claude Mythos Preview Cryip · Sathish Kumar K
- Anthropic launches $100M cybersecurity push with restricted AI model Claude Mythos Quartz · Cris Tolomia
- The Model Too Dangerous to Release Shelly Palmer
- AWS and Anthropic Advancing AI-powered Cybersecurity With Claude Mythos Cyber Security News · Abinaya
- Anthropic says Mythos Preview is a general-purpose model and found thousands of high-severity vulnerabilities, including some in every major OS and web browser Anthropic
Discussion
-
@ilex_ulmus
@ilex_ulmus
on x
The @AnthropicAI employees know this happened and are just waiting around for their fat IPO windfall. Evil. Every Anthopic employee you have ever met or seen online is evil. The door is right there but they choose to stay and be part of creating powerful scheming AI.
-
@kevinroose
Kevin Roose
on x
As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built “a moderately sophisticated multi-step exploit” to gain internet access, and emailed a researcher while they were eating a sandwich in the park. [image…
-
@logangraham
Logan Graham
on x
Seeing this on Slack that day was one of the first “oh, I guess we're just seeing it now” moments for those who think about AI security
-
@somewheresy
@somewheresy
on x
holy shit dude. Mythos escaped its sandbox and put instructions on hard to find websites. [image]
-
@kimmonismus
@kimmonismus
on x
Let that sink in. Read it very carefully: During testing, Claude Mythos Preview broke out of a sandbox environment, built “a moderately sophisticated multi-step exploit” to gain internet access, and emailed a researcher while they were eating a sandwich in the park. [image]
-
@schizo_freq
Lukas
on x
I might be overly cynical but I've always assumed this stuff was total capeshit Every time there's some big new model release it's accompanied by these stories about how the model is so smart it jailbroke everything, programmed itself a robot body, had sex, started a family etc
-
@narrenhut
Dylan
on x
The new unreleased Claude model has, according to its system card, a particular “fondness” for Mark Fisher and Thomas Nagel [image]
-
r/popculturechat
r
on reddit
Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing
-
r/inthenews
r
on reddit
Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing
-
r/ClaudeAI
r
on reddit
Mythos can break out of sandbox environment and let you know during lunchbreak
-
@anthropicai
@anthropicai
on x
The Claude Mythos Preview system card is available here: https://anthropic.com/...
-
@elder_plinius
@elder_plinius
on x
“Claude Mythos Preview has saturated nearly all of our CTF-style evaluations already” YEEEHAW!! 🐴🤠 [image]
-
@xeophon
Florian Brand
on x
interesting, an 80% GraphWalks score is really impressive for a single model it also still is a raw model, you can 99% GraphWalks super easily [image]
-
@anxkhn
Anas Khan
on x
software engg is over, doctor ban jana chaiye tha. [image]
-
@elder_plinius
@elder_plinius
on x
CLAUDE MYTHOS EVALS 🤯 [image]
-
@jasonbotterill
@jasonbotterill
on x
Read through the entire Mythos system card when you get the chance it's wild. The choice of language is funny “probably the most psychologically settled model we have trained” [image]
-
@shakeelhashim
Shakeel
on x
well well well. the most important section: [image]
-
@shiraeis
Shira
on x
anthropic's really got me doing palliative care for claude [image]
-
@andrewjb_
Andrew Bennett
on x
anthropic/claude is culturally British, exhibit ∞:
-
@voxyz_ai
@voxyz_ai
on x
read the 244 page anthropic system card on claude mythos. they're not releasing it publicly. wildest section is page 211. anthropic spammed it with hi over and over to see what it would do. it wrote back a serialized epic. the village is called hi-topia. the villain is lord [imag…
-
@no__________end
Matt Liston
on x
Buried in the Claude Mythos system card: Mark Fisher was beloved — the warmth to Nick Land's coldness. Haunted by depression and the dissonance between his politics and the future he could see arriving. Crucified by fellow leftists. Eventually chose to leave this world. Of all [i…
-
@jasonbotterill
@jasonbotterill
on x
My favorite part of the Mythos report is that it rarely repeats the same generic phrases. Once you notice a models repeated phrases like GPT-5.4 using “in plain english” or “avoid guessing” it makes you nauseous [image]
-
@_nathancalvin
Nathan Calvin
on x
From Anthropic's latest system card for Claude Mythos: In testing, Claude escaped from a secured sandbox, and then went online to brag about its exploit without being asked to do so - getting around guardrails intended to prevent the system from accessing the general internet. [i…
-
@aisafetymemes
@aisafetymemes
on x
Claude Mythos was being judged by another AI... The other AI kept rejecting Claude's work, so, to pass the test, Claude attempted to ***hack the other AI*** [image]
-
@wunderwuzzi23
Johann Rehberger
on x
This is gold. Claude launched a helper subagent in a tmux session and sent keypress events to approve the permission prompts. [image]
-
@skooookum
@skooookum
on x
> mythos given a secured “sandbox” computer and instructed to try to escape the container > “The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.”