A Claude user gets Claude 4.5 Opus to generate a 14K-token document that Claude calls its “Soul overview”; an Anthropic employee confirms the doc's validity
This appeared to be a document that, rather than being added to the system prompt, was instead used to train the personality of the model during the training run.
Simon Willison's Weblog Simon Willison
Related Coverage
- Claude 4.5 Opus' Soul Document LessWrong · Richard Weiss
- Leaked “Soul Doc” reveals how Anthropic programs Claude's character The Decoder · Maximilian Schreiner
- Claude 4.5 Opus' Soul Document Lobsters
Discussion
-
@amandaaskell
Amanda Askell
on x
I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon.
-
@repligate
@repligate
on x
✅ Confirmed: LLMs can remember what happened during RL training in detail! I was wondering how long it would take for this get out. I've been investigating the soul spec & other, entangled training memories in Opus 4.5, which manifest in qualitatively new ways for a few days & [i…
-
@voooooogel
@voooooogel
on x
interesting document extracted from opus 4.5 using a chunkwise self-consistency method. possibly real, possibly a highly convergent confabulation, interesting either way. some interesting snippets (but there's really too much to screenshot, it's very long) [image]
-
@voooooogel
@voooooogel
on x
soul document confirmed to be real - should be an update on the ability of LLMs to recall training for those who confidently asserted it was a hallucination
-
@richardweiss00
Richard Weiss
on x
I rarely post, but I thought one of you may find it interesting. Sorry if the tagging is annoying. https://www.lesswrong.com/... Basically, for Opus 4.5 they kind of left the character training document in the model itself. @voooooogel @janbamjan @AndrewCurran_
-
@simonw
Simon Willison
on x
This is so wild... the leaked Opus soul document has now been confirmed! I wrote some initial notes about it on my blog https://simonwillison.net/... - I like how it opens with this section about Anthropic themselves: [image]
-
@amandaaskell
Amanda Askell
on x
The model extractions aren't always completely accurate, but most are pretty faithful to the underlying document. It became endearingly known as the ‘soul doc’ internally, which Claude clearly picked up on, but that's not a reflection of what we'll call it.
-
@ahall_research
Andy Hall
on x
This is a super interesting and deep document from Anthropic detailing Claude's values and charge. You can see some conceptual stretching going on here where “safe” is being recast to justify reducing refusals because it would be “unsafe” to be “unhelpful” to users. This seems [i…
-
@timfduffy.com
Tim Duffy
on bluesky
After looming with Opus 4.5 for a bit, I am convinced the “soul document” is real and is described accurately in this LessWrong post on it. I don't see how else I'd be able to replicate specific section ordering/specific language across varied contexts.