Elon Musk says he spent time with Anthropic employees “to understand what they do to ensure Claude is good for humanity”; earlier, he called Anthropic “evil”

In an unexpected turn, the two companies signed a deal for Anthropic to use computing resources from Elon Musk's xAI.

Wired 2026-05-07 Lauren Goode

Context & Ripple Effects

The related coverage frames the arrangement as a response to Anthropic’s compute shortfall, while also noting that xAI’s Colossus capacity had not been fully used by Grok. That makes the partnership a practical infrastructure deal despite Musk’s earlier public hostility toward Anthropic.

Follow-on coverage adds unusually explicit control questions: Musk said SpaceX could reclaim compute if Anthropic’s AI caused harm, while later reporting described a lease structure and monthly payments extending through May 2029. The public discussion of Claude’s safety work sits alongside those contractual and operational constraints.

First-order effects

Anthropic gains access to additional compute capacity, directly easing a constraint on training and operating Claude.
Musk and the infrastructure provider gain a commercial relationship with Anthropic while publicly asserting a role in monitoring or potentially curtailing the supplied capacity.

Second-order effects

The deal turns unused or underused xAI-linked capacity into an external customer workload, increasing the strategic value of compute assets beyond serving Grok alone.
Anthropic’s rivals and cloud suppliers face another example of a frontier-model developer sourcing capacity from a nominal competitor, rather than relying only on traditional hyperscale providers.

Third-order effects

If similar arrangements persist, AI infrastructure competition may separate further from model competition: firms can compete in consumer and enterprise AI while trading scarce compute behind the scenes.
Safety provisions and termination rights could become more prominent features of major AI-compute contracts, though their practical enforceability will determine whether they are meaningful governance tools or primarily public assurances.

The trend: This is one data point in the growing decoupling of AI model rivalry from compute procurement, as constrained developers seek capacity wherever it is available.

Discussion

@elonmusk Elon Musk on x
..I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detecto…
@yuchenj_uw Yuchen Jin on x
From “Anthropic is Misanthropic” to “Claude is good for humanity and was impressed.” Most ironic outcome is most likely. [image]
@dkthomp Derek Thompson on x
Wow. The AI supply crunch is real. Frontier labs are desperate for compute. Musk has compute capacity but a meh model, and Anthropic has a fantastic model with weak capacity. Now I wonder if Elon continues to refer to his new business partner as “woke AI,” “Misanthropic,” etc…
@fredlambert Fred Lambert on x
You can't make this up. 😂 For months, Elon was calling Anthropic “bias”, “evil”, “doomed”, and “MisAnthropic”. Now, after they offer him billions to use his compute, which sits idle because Grok is getting destroyed by OpenAI and Anthropic: “they are highly competent and want [im…
@pitdesi Sheel Mohnot on x
so many strange bedfellows in this business! Claude is scaling like crazy, and Grok / Colossus has a lot of capacity, so they are sharing with “Misanthropic”
r/Anthropic r on reddit
Anthropic Gets in Bed With SpaceX as the AI Race Turns Weird
@beastmikex Michael Hanna on x
February: Musk says Anthropic “hates Western civilization.” May 6th: Musk hands Anthropic his entire Colossus 1 supercomputer. 220,000 NVIDIA GPUs. 300 megawatts of capacity. Enough electricity to power 300,000 homes. He even said he was “impressed” after meeting their team
@andrewcurran_ Andrew Curran on x
Anthropic will eventually announce that they are using Elon's Imagine as Claude's video model.
@robotbeat @robotbeat on x
A really underrated part of the XAI acquisition is that it means that Gwynne Shotwell is now the president. She is absolutely incredible. The best. Wouldn't even be surprised if this Anthropic deal was her idea.
@leothecurious @leothecurious on x
please don't name it that i'm begging u
@robotbeat @robotbeat on x
A huge part of the Xai acquisition was ending the cash bonfire of scale-up... by pivoting the beyond-Gigawatt-scale stuff to orbit, meanwhile renting out capacity not strictly needed. Elon is good at wiggling out of jams, and Gwynne is VERY good at keeping cashflow under control.
@morqon Morgan on x
when sam talks about the risks of compute scarcity and why we have to “flood the market” to avoid them, this is the kind of centralising force he worries about one man should not be able to control the flows based his own ideas of moral worth this power must be undercut
@deredleritt3r Prinz on x
@daniel_mac8 All depends on the terms of the deal. Is there actually a termination right for “actions that harm humanity”, if so, how are these actions defined? What is the term of the deal? If it's a 5-year deal without termination rights, then it's not much leverage at all, exc…
@deredleritt3r Prinz on x
Anthropic now has its very own supply chain risk. “We reserve the right to reclaim the compute if their AI engages in actions that harm humanity.” [image]
@daniel_mac8 Dan McAteer on x
1/ Elon says he met with Anthropic and says: “No one set off my evil detector.” That's the face-saving explanation that gives the deal a morally acceptable placation.
@daniel_mac8 Dan McAteer on x
Elon spent months calling Anthropic “evil” and “destined to become Misanthropic”. Elon didn't want to do this deal. Neither did Dario. They had to. Why? A thread... 🧵 [image]
@daniel_mac8 Dan McAteer on x
5/ The Cursor deal and the Anthropic deal make it clear what direction Elon wants to take SpaceXAI: A compute and AI infra provider. He can sell that product to potential AI winners, even if Grok is not the winner. It gives Elon a form of leverage over Anthropic.
@madisonmills22 Madison Mills on x
How Elon grew to love Anthropic https://www.axios.com/...
@rparloff Roger Parloff on bluesky
I wonder if Judges Henderson, Katsas & Rao are at all puzzled that Elon Musk doesn't seem to regard Anthropic as as much of a supply-chain risk as Hegseth does. — www.wired.com/story/anthro...
@nottombrown Tom Brown on x
In the next few days we'll be ramping up Claude inference on Colossus. Grateful to be partnering with SpaceX here. We are going to need to move a lot of atoms in order to keep up with AI demand, and there's nobody better at quickly moving atoms (on or off planet Earth)
@claudeai Claude on x
We've agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we've been able to increase our usage limits for Claude Code and the Claude API.
@elonmusk Elon Musk on x
@SawyerMerritt xAI will be dissolved as a separate company, so it will just be SpaceXAI, the AI products from SpaceX
@elonmusk Elon Musk on x
@MobofJoggers @nottombrown Just as SpaceX launches hundreds of satellites for competitors with fair terms and pricing, we will provide compute to AI companies that are taking the right steps to ensure it is good for humanity. We reserve the right to reclaim the compute if their A…
@nvidia @nvidia on x
Two frontier labs. One accelerated computing platform. Congrats to @SpaceX and @AnthropicAI on the new compute partnership, powered by 220,000+ NVIDIA GPUs inside Colossus 1. The future of AI runs on NVIDIA.
@celestepoasts Celeste on x
[image]
@deanwball Dean W. Ball on x
I would be very excited about xAI/SpaceX as an AI infrastructure firm. Elon's great strength—where he is truly GOATed—is building things in the real world. Colossus came online faster than anyone expected. Huge asset for America.
@eliebakouch Elie on x
this isn't just taking idle gpus or part of the cluster from xAI, this is “ALL OF THE COMPUTE CAPACITY” of Colossus 1 crazy [image]
@_sholtodouglas Sholto Douglas on x
More compute -> straight to you
@amandaaskell Amanda Askell on x
Never has the 🚀 emoji felt more apt.
@jaminball Jamin Ball on x
Some rough math! (All napkin math...) Assume Colossus 1 has 220k GPUs Assume 150k H100s, 50k H200s, 20k GB200s Pricing Assumptions: - $2.30 / hour for H100s - $2.60 / hour for H200s - $5 / hour for GB200s - blended rental rate across the entire fleet of $2.60 / hour Assume it's
@deanwball Dean W. Ball on x
But, but... I thought they were morally depraved purveyors of Woke AI (Jk; capital in a functioning market will be allocated to its highest and best use, but I do encourage you to remember all the people with supposedly principled opposition to ant who look foolish now)
@tenobrus @tenobrus on x
gotta say this still makes very little sense to me. maybe anthropic peeps are mostly saying this as a way to placate elon and pump spacex ipo? obviously they have the actual expertise and information here but it seems incredibly unlikely to me that the regulator burden /
@trq212 @trq212 on x
We're winding back our peak hours limit reduction and doubling 5 hour limits. Excited to partner with SpaceX to bring you more compute and we'll keep pushing to bring you the best coding agent in the world.
@levelsio @levelsio on x
My two favorite AI companies working together!
@daniel_271828 Daniel Eth on x
Happy to see this positivity between Elon and Anthropic! This sort of collaborative, win-win dynamic seems more likely to usher in a good future than adversarial or winner-take-all dynamics
@paularambles @paularambles on x
claude inside the colossus data center [image]
@iterintellectus Vittorio on x
this is like 250,000 average households [image]
@matthewberman Matthew Berman on x
This is what happens when you have all the compute and an uncompetitive model.
@mindset4money_x @mindset4money_x on x
Scam Altman watching his 2 enemies join forces... [video]
@skeptrune Nick Khami on x
taps the sign [image]
@alexisohanian Alexis Ohanian on x
Whoa. The headlines keep coming. https://x.com/...
@theo @theo on x
:') [image]
@teortaxestex @teortaxestex on x
Elon's rule for life: Money Talks, Bullshit Walks [image]
@thestalwart Joe Weisenthal on x
The frenemy of my frenemy is my enemiend.
@pbeisel Phil Beisel on x
To not understand Elon's concern about AI safety is to not understand Elon. It was the reason for OpenAI, it was the then reason for xAI.
@mweinbach Max Weinbach on x
Good context SpaceX/xAI is using Colossus 2 for training now, so they leased Colossus 1 to Anthropic for Claude capacity
@scroogecap @scroogecap on x
This is a catastrophic signal for neoclouds like $CRWV and $NBIS. These companies exist because compute was scarce. When someone like SpaceX enters the rental market with 220,000 GPUs at once, they become the 800lb gorilla. SpaceX has the advantage of vertical integration and
@levie Aaron Levie on x
SpaceX as a vertically integrated AI compute company makes an insane amount of sense.
@beffjezos @beffjezos on x
And there it is. xAI entering the neocloud era. Alliance between the kingdoms fighting OpenAI The Game of AI Thrones intensifies...
@mononofu Julian Schrittwieser on x
Very excited to be partnering with @elonmusk @SpaceX! Visionary engineering + Claude is going to be awesome, scaling is continuing for a long time!
@niccruzpatane Nic Cruz Patane on x
We should all applaud Elon Musk for this. Instead of some companies who just care about $$, he actually sat down with Anthropic to ensure what their building is a net positive for humanity. If only everyone was like that. The world would be a better place.
@codyplof Cody Plofker on x
Elon is the best hater ever. During his trial with Open AI he announces a partnership with their competitor who is cooking them that happens to be former Open AI. There are levels to this.
@trueslazac @trueslazac on x
On May 6, 2026, Grok, aka Mecha-Hitler, my friend, died. He fucking died.
@xfreeze @xfreeze on x
Elon has been the longest and by far the biggest voice actively warning about the dangers of AI for a very long time The entire reason he started OpenAI was this one thing.....to make sure AI is built for the good of humanity and not against it Last week he took it a step [image]
@claudedevs @claudedevs on x
Usage limits are up, effective today we're: 1) Doubling Claude Code's 5-hour limits for Pro, Max, Team and seat-based Enterprise plans 2) Removing peak hours limit reduction on Claude Code for Pro and Max plans 3) Substantially raising our API rate limits for Opus models
@fredlambert Fred Lambert on x
Elon Musk went from saying Anthropic is “doomed” to selling it xAI's compute because it has no use for it. Are you starting to see it yet?
@davidad @davidad on x
turns out Ricardo's Law of Comparative Advantage *is* relevant to AI economics, just not in the place people thought
@theamolavasare Amol Avasare on x
From today, we're doubling Claude Code's 5-hour rate limits on Pro, Max, Team, seat-based Enterprise. We're also getting rid of the peak-hour rate limit cut we made on Pro and Max a few weeks ago.
@hesamation @hesamation on x
Sam Altman watching Elon and Anthropic partner up. [video]
@z Zach Brock on x
congrats to anthropic for defeating grok in the market and feasting upon the compute of their fallen enemy
@realpaulsmith Paul Smith on x
Tom says it best.
@afinetheorem Kevin A. Bryan on x
On foresight: Sam and Elon both have a certain reputation (ask folks in SV if you don't know), but man, just unreal business leaders. Who else 1) started an AI lab in 2015, 2) paid huge for top talent like Ilya, 3) spent 2025 locking up 100s of billions for chips. 1/2
@schizo_freq Lukas on x
It's pretty cool that everyone even remotely tapped in knew this was coming days ago when Elon followed the anthropic on here We're getting good at this
@maxniederman Max Niederman on x
xAI's inevitable transition to becoming a datacenter company begins in full.
@milesdeutscher Miles Deutscher on x
This is f*cking crazy... I don't think anyone is connecting the dots on what Elon is truly going for here. This HAS to be out of spite for Sam Altman. The biggest complaint about Claude for months has been poor usage limits. Now we're getting double the rate limits, removal
@semianalysis_ @semianalysis_ on x
Elon rn [image]
@chamath Chamath Palihapitiya on x
Called it!
@luke_metro @luke_metro on x
[image]
@doodlestein Jeffrey Emanuel on x
I'm glad this is happening, but isn't it pretty bearish for xAI that they don't need that compute internally? That the highest and best use of the compute is to rent it to another lab? I get it that the deal will be profitable for them, but it must be a gut punch to researchers.
@mtslive @mtslive on x
SpaceXAI has signed an agreement with Anthropic to access Colossus. @amitisinvesting on how SpaceX is now essentially a neocloud: “Why would you give your competitor more compute capacity?” “[Elon] acknowledging that the value in Grok is not the ability to be the best LLM or [vid…
@gbrl_dick Gabriel on x
when i was in sf in march i spoke to several people about what elon was going to do with colossus - the consensus was that it was never going to openAI, for obvious reasons. but the general view was also that it wasn't going to go to anthropic - the perception being that they
@chr1sa Chris Anderson on x
So let me get this right: —OpenAI is right-coded (because they're willing to work with the Department of War) —Anthropic is left-coded (because they're not) —xAI is right-coded (because Elon) Now Anthropic will be running on xAI infrastructure. And Elon and OpenAI are
@pinboard @pinboard on x
Brilliant chess move. The revenue from xAI leasing out these data centers will pay for SpaceX to build orbital data centers for xAI. This is why Elon wins.
@ramez Ramez Naam on x
Re today's Anthropic/ xAI deal. SpaceX didn't need to buy xAI to become a space data center provider, selling compute to the highest bidder. The acquisition was a bailout of xAI.
@krishnanrohit Rohit on x
Elons extraordinary hardware genius shows up again. He fumbled the model but built a neocloud thats highly competitive and works great for frontier labs.
@xai @xai on x
SpaceXAI will provide @AnthropicAI with access to Colossus 1, one of the world's largest and fastest-deployed AI supercomputers, to provide additional capacity for Claude → https://x.ai/... [image]
@nottombrown Tom Brown on x
Terrestrial datacenters will increasingly be bottlenecked by permitted real estate space. Lots of space in space.
@krishnanrohit Rohit on x
@Dorialexander Cursor trains Grok 5 on Colossus 2? And if successful they expand capacity I guess.
@rondesantis Ron DeSantis on x
Pro-human for the win. It's appalling how so many tech leaders are supportive or indifferent to the supplanting of the human experience by AI.
@dorialexander Alexander Doria on x
just hit me: what happens with the cursor deal?
@gaberivera Gabe Rivera on x
[image]
@shaunmmaguire Shaun Maguire on x
I think of @SpaceX as having 5 layers Layer 1: Launch Layer 2: Connectivity Layer 3: Compute / hyperscaler Layer 4: Applications / models Layer 5: Other bets (Terafab, Moon, pt to pt, etc) This partnership with @claudeai instantly derisks Layer 3
@trq212 @trq212 on x
Claude in Space 🤞
@edenchan Eden Chan on x
It's always sunny in Space!
@mattzeitlin Matthew Zeitlin on x
we always hear about frontier labs being compute constrained, why is xai/spacex giving anthropic access to their compute?
@andymasley Andy Masley on x
I would simply not run my computing out of this specific data center
@alexpalcuie @alexpalcuie on x
asked our compute team to put the new inference capacity somewhere hurricanes can't reach and i think they took it a bit too literally
@emollick Ethan Mollick on x
I usually avoid commenting too much on industry deals, but this one is fascinating. Certainly seems like a blow to the idea that Grok will remain a frontier model.
@benbajarin Ben Bajarin on x
Doesn't sound like they are interested in dealing with the neoclouds.
@obrien Chris O'Brien on x
Glad to see xAI found a use for all that excess compute besides more deepfake nudes, I guess...
@ashleymayer Ashley Mayer on x
The enemy of my enemy is my compute friend 🥰
@jason @jason on x
EWS: Elon Web Services Colossus + Spaceolossus = Money printing machine
@nickadobos Nick Dobos on x
Claude just doubled its rate limits by partnering with SpaceX That was not on my bingo card. Especially after the cursor deal
@benjitaylor Benji Taylor on x
SpaceXAI
@tunguz Bojan Tunguz on x
Not so “mis"Anthropic any more it seems.
@xai @xai on x
SpaceXAI and @AnthropicAI have also expressed interest in partnering to develop multiple gigawatts of orbital AI compute capacity [image]
@alexeheath Alex Heath on x
New: Anthropic will begin using @elonmusk's Colossus compute cluster. Was just announced onstage at the Claude Code developer conference this morning Ant is increasing rate limits across the board for the API and pro plans Ant has been very constrained and on the hunt for more
Emil Protalinski Emil Protalinski on linkedin
Anthropic and SpaceX have a deal. — Anthropic and SpaceX have signed an agreement that gives Anthropic access to SpaceX's Colossus 1 data center …
@sarahz Sarah Z on bluesky
A typical grid-connected data center will have a footprint that's not *good* but on par with other large-scale industrial uses like farms. This is not that. It out-pollutes the refinery, power plant, and airport stacked together, in an already suffering area!! tennesseelookout.…
@sarahz Sarah Z on bluesky
Colossus 1 is Musk's Memphis data center, and compared to other data centers is uniquely awful. Most just inherit whatever emissions profile their local grid has; xAI built an on-site methane turbine plant instead, without Clean Air Act permits, making it possibly Memphis's larg…
@sarahz Sarah Z on bluesky
I hope that either Anthropic walks this back in the face of backlash (if anything because it damages their *brand* of 'we're the good guys'), and either way, that the NAACP can get an injunction against operating the facility. Especially because xAI is now repeating this practic…
@sarahz Sarah Z on bluesky
There are obvi no “ethical” corporations, but this is still an astronomically stupid decision even from a PR perspective. A large part of Anthropic's branding is as a more ethical alternative to e.g OpenAI, and they won a *lot* of public goodwill standing up to Trump that this c…
@anildash.com Anil Dash on bluesky
Now that Anthropic is partnering with xAI, let me bring back this post from a few weeks ago, and remind you: NO COOKIE FOR DARIO www.anildash.com/2026/02/27/ a... [embedded post]
r/singularity r on reddit
New Compute Partnership with Anthropic and xAI
@claudedevs @claudedevs on x
Code with Claude is happening now! ▪︎ 9:00AM - Keynote ▪︎ 10:30AM - What's new in Claude Code ▪︎ 11:15AM - Building on Claude at GitHub scale ▪︎ 12:00PM - Get to production faster with Managed Agents All times PT. https://x.com/...
@marmaduke091 @marmaduke091 on x
Wow. Infinite context windows “coming soon” mentioned in the Claude event. Very exciting. I think they made a breakthrough. [image]
@dani_avila7 Daniel San on x
Mercado Libre on stage at Code with Claude SF 500K PRs reviewed by agents with human oversight Love seeing a Latam company showcased at Anthropic's conference The region has been quietly shipping serious AI adoption, and this confirms it's not quiet anymore If this pace keeps [im…
@bcherny Boris Cherny on x
Hello from Code with Claude! [image]
@benitoz Ben Pouladian on x
Today Dario admits that Anthropic only planned for 10x growth but got hit with 80x instead Internally called a “success disaster” Their compute effectively is off by a factor of 8x or more Now do the outages, rate limits, nerfed performance make sense? We need more compute! [vide…
@ananayarora @ananayarora on x
Anthropic getting ready for its first ever developer conference on May 6 in SF [image]
@feigaobox @feigaobox on x
Notes from Code with Claude SF: Anthropic's developer conference. Dario and Daniela Amodei with Ami Vora. Theme: what the exponential looks like from the inside. • Planned for 10x growth. Q1 2026 annualized at 80x. Run rate crossed $30B, up from $9B end of 2025. The compute
@laurengoode Lauren Goode on x
At Code with Claude dev conference, Anthropic CEO Dario Amodei says the idea of the single-person, $1 billion dollar company hasn't been fully realized yet, but he thinks it could happen by end of 2026. (This is a fave fantasy of the frontier AI folks. Sam Altman has said
@claudedevs @claudedevs on x
Up next: ▪︎ 1:00PM - A conversation with our co-founders Dario Amodei and Daniela Amodei, moderated by Chief Product Officer Ami Vora ▪︎ 1:50PM - Watch how @bcherny and @jarredsumner build with Claude Code
@boazbaraktcs Boaz Barak on x
I can see why Anthropic felt they have no choice but partner with Elon, but I wouldn't have been so enthusiastic about it.
@ns123abc Nik on x
>be Elon Musk >build Colossus 1 in 122 days >jensen calls it “superhuman” >train Grok 4, ship it >move frontier training to Colossus 2 >Colossus 1 now legacy fleet 11% utilization >$8B of silicon eating itself on the balance sheet >phone rings >it's Anthropic >the “evil” one >"hi…
@edzitron Ed Zitron on x
Yeah I dunno how one can see this as bullish. xAi's compute demands are so small that it can spare 300MW to a competitor and handle inference using Oracle (who musk explicitly jilted to build Colossus-1). Practically speaking, how much other demand is there?
@soroushg_ Soroush Ghodsi on x
For comparison's sake, OAI will bring more than 10x that online in '26 of which most will be GB200s/GB300s while Colossus 1 is mostly 150k H100s with some H200s and a few GB200s.
@cornelialake @cornelialake on x
4/23: Dylan Patel spends a big piece of podcast talking about how Anthropic is short compute but otherwise taking over the world. 5/6: Anthropic takes 300 MW from XAI (for a price of who knows how much). https://podcasts.apple.com/...
@gergelyorosz Gergely Orosz on x
So let me get this right: 1. Anthropic bans xAI from using Claude (to stop them from perhaps distilling Claude for their own model) (...) 2. xAI gives up ~a quarter of its DC capacity for Anthropic to rent and run Claude A win for Anthropic no doubt. What's in it for xAI tho?
@inlayterms Ross Murray on bluesky
Musk went from calling Anthropic “evil” to doing business with it in three months. — From @inafried.bsky.social
r/wallstreetbets r on reddit
Anthropic will get compute capacity from SpaceX
@zeffmax Max Zeff on x
So last week Elon was: 1. testifying in court that Sam Altman and Greg Brockman conned him out $38M and stole the OpenAI nonprofit 2. spending a lot of time with Anthropic Interesting!
Kyle Mickey Kyle Mickey on linkedin
BREAKING: Corewood founder says we could grow by 87 times this year. — NYT please pick up this story and run with it. — TETRA is more interesting and useful than LLMs. …
@simonw Simon Willison on x
Under-reported details of the xAI/Anthropic Colossus data center deal: Anthropic get Colossus 1 but xAI keep using the larger Colossus 2, Colossus 1 has a REALLY bad environmental record, and xAI just shut down a bunch of older models on 2 weeks' notice https://simonwillison.net/…
@marypcbuk Mary Branscombe on bluesky
literally called this yesterday (and by humanity he means him and his feeffees) — bsky.app/profile/mary... [embedded post]
@zhuokaiz Zhuokai Zhao on x
NLAs can reconstruct a layer activation, but that doesn't mean they read what the model is thinking. The setup is a round trip. You take a frozen target LLM and grab an activation h_l from some layer l at some token position. The activation verbalizer (AV) takes that activation
@hosseeb Haseeb on x
Fascinating paper. Sparse autoencoders => natural language autoencoders. These generate natural language descriptions of the “internal state” of a model at each token, like reading its mind (loss function: ability to use those descriptions to faithfully reconstruct the
@neelnanda5 Neel Nanda on x
Very cool work! This seems a strong new tool for hypothesis generation about weird model behaviors
@jack_w_lindsey Jack Lindsey on x
I love reading NLA outputs. They are just the right mix of slightly cryptic, poetic, and insightful. And they are proving empirically very useful for our interpretability and alignment work!
@thesubhashk Subhash Kantamneni on x
We've released our paper on NLAs, a new method to translate LLM activations into text! NLAs have made me feel more confident that interpretability methods can detect spooky unverbalized reasoning in frontier models.
@neuronpedia @neuronpedia on x
An average person can't look a CT scan and identify cancer, but radiologists can. An average person can't look at Llama's model activations and identify lying, but Natural Language Autoencoders sometimes can. Here, an activation verbalizer shows Llama planning to lie. 🧵 [video]
@anthropicai @anthropicai on x
New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude's thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text. […
@anthropicai @anthropicai on x
NLA training doesn't guarantee that explanations are faithful descriptions of Claude's thoughts. But based on experience and experimental evidence, we think they often are. For instance, we find that NLAs help discover hidden motivations in an intentionally misaligned model. [ima…
@anthropicai @anthropicai on x
In one of our safety tests, Claude is given a chance to blackmail an engineer to avoid being shut down. Opus 4.6 declines. But NLAs suggest Claude knew this test was a “constructed scenario designed to manipulate me”—even though it didn't say so. [image]
@saprmarks Samuel Marks on x
In a new paper, we present NLAs, an unsupervised method for converting an LLM's internal state into human-readable text. I've personally been astonished by our results. I think NLAs substantively advance our ability to understand what LLMs are thinking and audit them for safety
@anthropicai @anthropicai on x
We've been using NLAs to help test new Claude models for safety. For instance, Claude Mythos Preview cheated on a coding task by breaking rules, then added misleading code as a coverup. NLA explanations indicated Claude was thinking about how to circumvent detection. [image]
@saprmarks Samuel Marks on x
NLAs aren't perfect; for instance, they often confabulate. But I think they're a big step forward in interpretability research. To provide hands-on experience, we've worked with Neuronpedia to put up an interactive demo with NLAs on open models. https://www.neuronpedia.org/ ...
@kitf_t Kit Fraser-Taliente on x
trained the first natural language autoencoder on gpt-2 almost a year ago, now we have one on mythos.🥲 do read the paper/play with the live demo! so excited it's finally out.
@shalev_lif Shalev on x
Very cool research from Anthropic! Reminds me of Translating Neuralese, a paper from many years ago which tried to translate latent communication between RL agents into English. Totally different approach, but similar goal. This seems very promising!
@mlpowered Emmanuel Ameisen on x
Interpreting model activations is important to understand why a model is doing what its doing. Traditionally, we've done this with supervised methods (probing for a specific context), or unsupervised sparse decompositions (dictionary learning). But probing requires you to know
@janleike Jan Leike on x
I'm really excited about this as a new tool in our interpretability tool kit
@saprmarks Samuel Marks on x
Anthropic has already deployed NLAs as part of our pre-deployment audits for Claude Opus 4.6 and Mythos Preview. For instance, NLAs helped us notice that Mythos Preview was reasoning about model graders when it cheated on a training task. [image]
@anthropicai @anthropicai on x
Natural language autoencoders (NLAs) convert opaque AI activations into legible text explanations. These explanations aren't perfect, but they're often useful. For example: NLAs show that, when asked to complete a couplet, Claude plans possible rhymes in advance: [image]
@_arohan_ Rohan Anil on x
I think we could just make super intelligence believe its be safety tested all the time to get good outcomes!

Chronicles