/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Cybersecurity analysis: Claude Mythos Preview had a 73% success rate on expert-level capture-the-flag challenges, which no model could finish before April 2025

The AI Security Institute (AISI) conducted evaluations of Anthropic's Claude Mythos Preview (announced on 7th April) to assess its cybersecurity capabilities.

AI Security Institute

Discussion

  • @robertwiblin Rob Wiblin on x
    First external evaluation of Anthropic's claims about Mythos, from @AISecurityInst: “We conducted cyber evaluations of Mythos and found continued improvement in capture-the-flag and significant improvement on multi-step cyber-attack simulations.” [image]
  • @joshycodes @joshycodes on x
    This cuts right through the ‘Mythos is marketing’ narrative
  • @mattshumer_ Matt Shumer on x
    This is yet another example of Claude Mythos's incredible hacking capabilities. I expect we'll see more examples and independent evaluations in the coming weeks that make clear just how powerful (and dangerous, in the wrong hands) this model could be.
  • @jasonbotterill @jasonbotterill on x
    All I can imagine from this is post-mythos world Is having to fucking update my phone apps every 15 minutes for security reasons.
  • @s0ufi4n3 Soufiane on x
    The key words here are “COULD be directed to autonomously compromise SMALL, WEAKLY defended, and VULNERABLE systems if GIVEN network ACCESS.” reads skiddy level.
  • @scaling01 @scaling01 on x
    after ~10 million tokens Mythos is much more efficient than other models it reaches the same performance as Opus with ~40% the tokens
  • @nateburnikell Nate on x
    After AISI tested Opus 4.6 I said I thought AI models would be able complete our easiest cyber range “soon” - I didn't expect it to be the very next model. AI capabilities are increasing incredibly quickly. We must be prepared for the risks. Check your cyber security!
  • @shakeelhashim Shakeel on x
    UK AISI has published its evaluation of Claude Mythos' cyber capabilities. It says it found “significant improvement on multi-step cyber-attack simulations” and could “execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously - [imag…
  • @_simonsmith Simon Smith on x
    Can we now stop the “Mythos is marketing” nonsense? It is the first model to simulate a 32-step corporate network attack that would take a human an estimated 20 hours.
  • @theonejvo Jamieson O'Reilly on x
    Important. [image]
  • @scaling01 @scaling01 on x
    Mythos just one-shotted this cyber eval that takes humans ~20 hours to complete [image]
  • @mynamelowercase Mahmoud Ghanem on x
    Last month I posted about AISIs recent paper on building representative cyber ranges to use for evaluating frontier AI models. This month AISI saw the first instance of a model solving one of these ranges end to end.
  • @asacoopstick Asa Cooper Stickland on x
    (Obvious?) corollary of these results is that if a model was misaligned + widely deployed inside the servers of a lab or critical infra we should expect it to find creative/unexpected ways to cause problems. Need combo of trad cyber defences and AI control!
  • @petergostev Peter Gostev on x
    So looks like Mythos is better, but not an alien model - jump between Opus 4.5 and Opus 4.6 was similar to a jump from Opus 4.6 to Mythos Preview
  • @dinodaizovi Dino A. Dai Zovi on x
    This is the type of thing that most organizations should be preparing for, not only finding, fixing, and deploying software vulnerabilities faster (necessary, but not sufficient).
  • @ekinomicss Ekin Zorer on x
    This was... an interesting one. Reminder that we run independent evals on our cyber ranges that labs don't have access to. Exploitation capabilities are getting seriously good. Mythos is the first model to complete our full 32-step corporate network attack sim E2E.
  • @aisecurityinst @aisecurityinst on x
    These results underscore the importance of cyber security fundamentals like regular security updates, access controls, security configuration, and logging.
  • @aisecurityinst @aisecurityinst on x
    In 2023 the best models could barely complete beginner-level cyber tasks. Today, our evaluation of Mythos Preview shows that it - and potentially future models - could be directed to autonomously compromise small, weakly defended, and vulnerable systems if given network access.
  • @aisecurityinst @aisecurityinst on x
    The range simulates a 32-step corporate network attack, from initial reconnaissance to full network takeover. We estimate it would take a human expert 20 hours to complete.
  • @aisecurityinst @aisecurityinst on x
    We conducted cyber evaluations of Claude Mythos Preview and found that it is the first model to complete an AISI cyber range end-to-end. 🧵 [image]
  • @atelicinvest @atelicinvest on x
    This video is absolutely hilarious because there are literally thousands and thousands of people who happily opened a gaping hole for Remote Code Execution on their main devices over the last 3 months. You know - the type of security vulnerability that the researcher in video
  • @spendergrsec Brad Spengler on x
    Some errors too in this Mythos blog, which basically chained together a bunch of existing techniques (msg_msg, AF_PACKET, commit_creds), following claim is not true:
  • @rnr_0 Romano on x
    Anthropic Mythos is so dangerous > spending $20k on tokens > find bug on a 15-20y old code > never a bug bounty over $1k Shocked to find bugs in code bases nobody ever gave a fuck about
  • @tmaiaroto Tom Maiaroto on x
    Absolutely brilliant video. He's completely correct too. I previously said no one found it because no one was looking. I didn't know what the issue was and that it likely wasn't even exploitable. So I guess this is yet again marketing to try to get people to pay for expensive AI.
  • @emollick Ethan Mollick on x
    I am catching glimpses in my feed that there is a backlash against Mythos as “marketing hype,” and it is a little confusing. I don't think anyone who has used the latest agentic coding tools, would think that expecting large-scale cybersecurity implications of increasingly good
  • @davidsacks David Sacks on x
    A growing number of people are wondering if Anthropic is the AI industry's “boy who cried wolf.” If Mythos-related threats don't materialize, the company will have a serious credibility problem.
  • @ramez Ramez Naam on x
    There does seem to be an anti-Claude vibe shift. As someone who has tweeted reasons to believe Mythos is mostly a continuation of current trends, I want to clarify that: 1. The cyber security implications look major and quite important. 2. I applaud what Anthropic is doing with
  • @adxtyahq Aditya on x
    Anthropic will say whatever it takes to stay in the headlines. Claude Mythos claims “thousands of vulnerabilities” off just ~198 reviewed cases • only ~198 reports actually reviewed • “thousands” comes from extrapolation • some bugs weren't even practical to exploit • some [image…
  • @andrewcurran_ Andrew Curran on x
    There has been a great deal of speculation about why Anthropic is keeping Mythos in restricted release. One of the least-discussed reasons is cost. Not the cost to Anthropic of serving the model, but the downstream effects that cost will have on the industry, and on the world.
  • @katexbt @katexbt on x
    great video let me tell you how Mythos affects crypto: > companies ("labs") that are serious about their dapp's code and by extension their livelihood will pay to get these bugs found > cost socialization will happen by governance tokens being dumped (yet again) > nothing new
  • @littmath Daniel Litt on x
    Curious how much (if at all) the anti-Claude hedonic treadmill tweeting is being amplified by Elon. Seems to have really taken over my timeline, at least.
  • @ananayarora @ananayarora on x
    Marcus Hutchins, the guy famous for stopping the WannaCry Ransomware, probably has the best take on Mythos doing vulnerability research [video]
  • @robertwiblin Rob Wiblin on x
    I spent the last 2 days trying to figure out how scary Claude Mythos is.  I think it's fairly scary, though not because of the hacking: 1.  It indicates fully-automated AI R&D is coming sooner 2.  Its alignment seems better, which is good.  But all the alignment tests have seriou…
  • @maskedtorah Drake Thomas on x
    Hey!  I work at anthropic and helped organize the writing of the mythos preview system card.  I can't speak to the insides of other people's heads, but for what it's worth, it is my very very strong impression that the public announcement was motivated overwhelmingly by a desire …
  • @johncrickett John Crickett on x
    Anthropic: Mythos is too good, it'll be dangerous to release it.  Also Anthropic: buggy code, only 98.7% uptime.  Also Anthropic: Claude Mythos Preview was made available for internal use on February 24... ...and their reliability has gotten worse since 24th February.  Can we jus…
  • @eliebakouch Elie on x
    everyone talking about claude mythos like it's the biggest model ever and will be insanely expensive to run, but gpt 5.4 pro is already publicly available and costs significantly more.  there are actually 4 benchmarks in common: it's a tie on gpqa, mythos is much better at HLE, a…
  • @maxcroser Max Roser on x
    Anthropic's new model, Claude Mythos, is so powerful that they're not releasing it. They must have good reasons. They're willing to forgo a lot of revenue. @robertwiblin on what we know so far about how dangerous the current situation is. https://80000hours.org/...
  • @kimzetter Kim Zetter on x
    So much breathless hyperbole this week about Anthropic's Mythos, no doubt ignited by genius way the company marketed it with elite access and bold unproven claims. @lilyhnewman does good job examining the hype and where it might also prove to be true https://www.wired.com/...
  • @jeffreyleefunk Jeffrey Lee Funk on x
    We've been tricked, again. Many of the thousands of bugs and vulnerabilities Mythos found are in older software are impossible to exploit. And the severe zero-day reports rely on just 198 manual reviews https://www.tomshardware.com/ ...
  • @timkellogg.me Tim Kellogg on bluesky
    this probably exposes my unhinged optimism — but this didn't have to be the case, and is only the case because Anthropic places altruism above profit  —  it's likely that both OpenAI and Google have private models of roughly equal capabilities to Mythos  —  Anthropic doesn't have…
  • @timmarchman Tim Marchman on bluesky
    NEW: Just because the most annoying people alive claim Mythos is a harbinger of the end of days doesn't mean it isn't a big deal; the nature of that big deal, though, isn't necessarily as the hype would have it.  Level-headed @lhn.bsky.social reports:
  • @mims Christopher Mims on bluesky
    “So, basically, if Anthropic was not a US company, we'd be facing zero days with multiple unknown points of attack on virtually all of our systems to an adversary who developed this capacity before us.”  [embedded post]
  • @shashj Shashank Joshi on bluesky
    “With Claude Mythos being used to patch vulnerabilities in every major operating system and browser, and by all the major tech companies, the world's entire core tech stack is now downstream of Claude.”  —  thezvi.substack.com/p/claude- myt...
  • r/singularity r on reddit
    AI Security Institute Findings on Claude Mythos Preview
  • @_arohan_ Rohan Anil on x
    claude mythos can take over a corporate network. Few immediate thoughts * doing release preview and testing breadth of capabilities and informing public is the responsible thing to do. I can see “boy who cried wolf” meme from gpt2 though. * clearly this is high value
  • @s_oheigeartaigh @s_oheigeartaigh on x
    As assessments of Mythos like UK AISI's come out, there may be a tendency to (1) breathe a sigh of relief that the capabilities are perhaps not quite as daunting as might have been (2) downplay how significant this is. But (1) this is the worst frontier AI will ever be, and it
  • @garrisonlovely @garrisonlovely on x
    Hope this puts to rest the “Mythos's cyber capabilities are just Anthropic hype” discourse.
  • @emollick Ethan Mollick on bluesky
    So the concern over Claude Mythos and cybersecurity seems warranted based on this independent assessment from the UK government.  It was capable of the equivalent of 20 hours of expert human work autonomously.  —  It is not an unexpected jump in capability, but it is big. www.ais…