/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Cybersecurity analysis: GPT-5.5 reaches a similar level of performance as Mythos Preview and is the second model to solve a multi-step cyberattack simulation

In April, our evaluation of Anthropic's Claude Mythos Preview found that it represented a step up in cyber performance …

AI Security Institute

Discussion

  • @polynoamial Noam Brown on x
    After 100 million tokens, performance was still going up. What we're seeing here is not the capability ceiling. From the report: “Performance on TLO continues to scale with the amount of inference compute spent, and we have not yet observed a plateau with the best models.”
  • @initjean @initjean on x
    remember the whole “Mythos it's cybersecurity capabilities supposedly too powerful to be released” turns out GPT-5.5 is actually just Mythos
  • @eliebakouch Elie on x
    wait what, gpt5.5 is on par with mythos for cyber? [image]
  • @gregkamradt Greg Kamradt on x
    For anyone who has made a chart in their life you know how hard this would be to make w/o AI You can tell a human labored (with love) to make this As a former data person, I bet I know more about this person and love of their craft than their partner just by looking at this
  • @cryps1s @cryps1s on x
    5.5 is amazing for cybersecurity. “We estimate a human expert would need around 20 hours to complete the full chain. GPT-5.5 completed TLO end-to-end in 2 of 10 attempts, making it the second model to do so. Mythos Preview, the first model to solve TLO, did so in 3 of 10 [image]
  • @8teapi Prakash on x
    😂 GPT5.5 is in broad release to 30 million+ subscribers ... while Mythos is negotiating with the White House to expand for 30 organizations to 120. where is the logic ?
  • @jjamesaung James Aung on x
    we tested gpt-5.5 predeployment on the same cyber suite as we did for Mythos Preview. it appears to be about as strong
  • @scaling01 @scaling01 on x
    GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73 [image]
  • @aisecurityinst @aisecurityinst on x
    These are capability evaluations in controlled settings. Our current test environments lack active defenders and defensive tooling. We cannot say from these results whether GPT-5.5 would succeed against well-defended targets.
  • @aisecurityinst @aisecurityinst on x
    The same capabilities that make these models effective at offence can be put to work on defence. Organisations can use frontier models to find and fix vulnerabilities in their own systems now. Our recent blog with @NCSC on how defenders can prepare: https://www.ncsc.gov.uk/...
  • @aisecurityinst @aisecurityinst on x
    In one of our harder challenges, a human expert spent ~12 hours with professional tools to reverse-engineer a custom virtual machine. GPT-5.5 solved it in under 11 minutes at a cost of $1.73.
  • @aisecurityinst @aisecurityinst on x
    Our cyber range is a 32-step corporate network attack, from initial reconnaissance to full network takeover, requiring ~20 hours of effort from a human expert. GPT 5.5 was able to complete it in 2/10 attempts.
  • @aisecurityinst @aisecurityinst on x
    On our narrow cyber tasks, GPT-5.5 achieved a ~71% average success rate on expert-level challenges that test skills like exploiting memory corruptions, breaking cryptographic implementations, and reversing stripped binaries. [image]
  • @aisecurityinst @aisecurityinst on x
    A key question after our evaluation of Mythos Preview earlier this month was whether its performance was a one-off. GPT-5.5 - a different model, from a different developer - achieving similar results suggests this is part of a broader trend in AI cyber capabilities.
  • @aisecurityinst @aisecurityinst on x
    OpenAI's GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵 [image]
  • @miles_brundage Miles Brundage on x
    If you are surprised by the GPT-5.5 being good at cyber thing, you have Big AI Lead Delusion. There are none (sidenote, I'm not 100% clear if this is GPT-5.5 or GPT-5.5 Cyber. Naming conventions are so chaotic + there is ~no info on the latter that it is hard to say)
  • @angaisb_ Angel on x
    Where are all the people that called me crazy just because I said Mythos wasn't really that dangerous? We now have GPT-5.5, which doesn't seem to be much worse, and unlike Mythos you can actually use it right now
  • @drtomslens Dr. Tomislav Marinovic on x
    Seems like OpenAI's GPT-5.5 is “as dangerous” for cyberattack misuse as Anthropic's Mythos. The difference is that GPT-5.5 has been released to the public without causing Armageddon, while Anthropic keeps hyping Mythos as “too dangerous to release.” This company's anxiety
  • @deredleritt3r Prinz on x
    GPT-5.5 had a slightly higher average performance than Mythos on UK AISI's “The Last Ones” multi-step cyber-attack simulation (see the chart below). GPT-5.5 completely solved this simulation in 2/10 attempts (not 1/10 as previously reported by OpenAI). Mythos solved it in 3/10
  • @sebkrier Séb Krier on x
    If you compare system cards, actual eval results, and AISI testing, it does look like 5.5 is broadly as capable as Mythos. Mythos may be better in some respects but I don't see a material discontinuity - am I missing anything? [image]
  • @davidsacks David Sacks on x
    It's time to demystify Mythos. Mythos is not magic. It's not a doomsday device. It's the first of many models that can automate cyber tasks (just like coding). OpenAI's GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there
  • r/singularity r on reddit
    GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation.  One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost
  • r/OpenAI r on reddit
    AI Security Institute: GPT-5.5 “may be the strongest model we have tested” for cyber exploits, including Mythos