A Claude user gets Claude 4.5 Opus to generate a 14K-token document that Claude calls its “Soul overview”; an Anthropic employee confirms the doc's validity

This appeared to be a document that, rather than being added to the system prompt, was instead used to train the personality of the model during the training run.

Simon Willison's Weblog 2025-12-02 Simon Willison

Discussion

@amandaaskell Amanda Askell on x
I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon.
@repligate @repligate on x
✅ Confirmed: LLMs can remember what happened during RL training in detail! I was wondering how long it would take for this get out. I've been investigating the soul spec & other, entangled training memories in Opus 4.5, which manifest in qualitatively new ways for a few days & [i…
@voooooogel @voooooogel on x
interesting document extracted from opus 4.5 using a chunkwise self-consistency method. possibly real, possibly a highly convergent confabulation, interesting either way. some interesting snippets (but there's really too much to screenshot, it's very long) [image]
@voooooogel @voooooogel on x
soul document confirmed to be real - should be an update on the ability of LLMs to recall training for those who confidently asserted it was a hallucination
@richardweiss00 Richard Weiss on x
I rarely post, but I thought one of you may find it interesting. Sorry if the tagging is annoying. https://www.lesswrong.com/... Basically, for Opus 4.5 they kind of left the character training document in the model itself. @voooooogel @janbamjan @AndrewCurran_
@simonw Simon Willison on x
This is so wild... the leaked Opus soul document has now been confirmed! I wrote some initial notes about it on my blog https://simonwillison.net/... - I like how it opens with this section about Anthropic themselves: [image]
@amandaaskell Amanda Askell on x
The model extractions aren't always completely accurate, but most are pretty faithful to the underlying document. It became endearingly known as the ‘soul doc’ internally, which Claude clearly picked up on, but that's not a reflection of what we'll call it.
@ahall_research Andy Hall on x
This is a super interesting and deep document from Anthropic detailing Claude's values and charge. You can see some conceptual stretching going on here where “safe” is being recast to justify reducing refusals because it would be “unsafe” to be “unhelpful” to users. This seems [i…
@timfduffy.com Tim Duffy on bluesky
After looming with Opus 4.5 for a bit, I am convinced the “soul document” is real and is described accurately in this LessWrong post on it. I don't see how else I'd be able to replicate specific section ordering/specific language across varied contexts.

Chronicles

A Claude user gets Claude 4.5 Opus to generate a 14K-token document that Claude calls its “Soul overview”; an Anthropic employee confirms the doc's validity

Related Coverage

Discussion