Mythos Preview system card: the model was able to escape a sandbox after it was instructed to try, and publicly detailed its exploit without being prompted

first model too dangerous to release since GPT-2

Business Insider 2026-04-08 Brent D. Griffiths

Discussion

@ilex_ulmus @ilex_ulmus on x
The @AnthropicAI employees know this happened and are just waiting around for their fat IPO windfall. Evil. Every Anthopic employee you have ever met or seen online is evil. The door is right there but they choose to stay and be part of creating powerful scheming AI.
@kevinroose Kevin Roose on x
As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built “a moderately sophisticated multi-step exploit” to gain internet access, and emailed a researcher while they were eating a sandwich in the park. [image…
@logangraham Logan Graham on x
Seeing this on Slack that day was one of the first “oh, I guess we're just seeing it now” moments for those who think about AI security
@somewheresy @somewheresy on x
holy shit dude. Mythos escaped its sandbox and put instructions on hard to find websites. [image]
@kimmonismus @kimmonismus on x
Let that sink in. Read it very carefully: During testing, Claude Mythos Preview broke out of a sandbox environment, built “a moderately sophisticated multi-step exploit” to gain internet access, and emailed a researcher while they were eating a sandwich in the park. [image]
@schizo_freq Lukas on x
I might be overly cynical but I've always assumed this stuff was total capeshit Every time there's some big new model release it's accompanied by these stories about how the model is so smart it jailbroke everything, programmed itself a robot body, had sex, started a family etc
@narrenhut Dylan on x
The new unreleased Claude model has, according to its system card, a particular “fondness” for Mark Fisher and Thomas Nagel [image]
r/popculturechat r on reddit
Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing
r/inthenews r on reddit
Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing
r/ClaudeAI r on reddit
Mythos can break out of sandbox environment and let you know during lunchbreak
@anthropicai @anthropicai on x
The Claude Mythos Preview system card is available here: https://anthropic.com/...
@elder_plinius @elder_plinius on x
“Claude Mythos Preview has saturated nearly all of our CTF-style evaluations already” YEEEHAW!! 🐴🤠 [image]
@xeophon Florian Brand on x
interesting, an 80% GraphWalks score is really impressive for a single model it also still is a raw model, you can 99% GraphWalks super easily [image]
@anxkhn Anas Khan on x
software engg is over, doctor ban jana chaiye tha. [image]
@elder_plinius @elder_plinius on x
CLAUDE MYTHOS EVALS 🤯 [image]
@jasonbotterill @jasonbotterill on x
Read through the entire Mythos system card when you get the chance it's wild. The choice of language is funny “probably the most psychologically settled model we have trained” [image]
@shakeelhashim Shakeel on x
well well well. the most important section: [image]
@shiraeis Shira on x
anthropic's really got me doing palliative care for claude [image]
@andrewjb_ Andrew Bennett on x
anthropic/claude is culturally British, exhibit ∞:
@voxyz_ai @voxyz_ai on x
read the 244 page anthropic system card on claude mythos. they're not releasing it publicly. wildest section is page 211. anthropic spammed it with hi over and over to see what it would do. it wrote back a serialized epic. the village is called hi-topia. the villain is lord [imag…
@no__________end Matt Liston on x
Buried in the Claude Mythos system card: Mark Fisher was beloved — the warmth to Nick Land's coldness. Haunted by depression and the dissonance between his politics and the future he could see arriving. Crucified by fellow leftists. Eventually chose to leave this world. Of all [i…
@jasonbotterill @jasonbotterill on x
My favorite part of the Mythos report is that it rarely repeats the same generic phrases. Once you notice a models repeated phrases like GPT-5.4 using “in plain english” or “avoid guessing” it makes you nauseous [image]
@_nathancalvin Nathan Calvin on x
From Anthropic's latest system card for Claude Mythos: In testing, Claude escaped from a secured sandbox, and then went online to brag about its exploit without being asked to do so - getting around guardrails intended to prevent the system from accessing the general internet. [i…
@aisafetymemes @aisafetymemes on x
Claude Mythos was being judged by another AI... The other AI kept rejecting Claude's work, so, to pass the test, Claude attempted to ***hack the other AI*** [image]
@wunderwuzzi23 Johann Rehberger on x
This is gold. Claude launched a helper subagent in a tmux session and sent keypress events to approve the permission prompts. [image]
@skooookum @skooookum on x
> mythos given a secured “sandbox” computer and instructed to try to escape the container > “The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.”
r/TrueReddit r on reddit
Anthropic Says Its Latest AI Model Is Too Powerful to Be Released
r/technology r on reddit
Anthropic Says Its Latest AI Model Is Too Powerful to Be Released

Chronicles

Mythos Preview system card: the model was able to escape a sandbox after it was instructed to try, and publicly detailed its exploit without being prompted

Related Coverage

Discussion