Anthropic introduces “persona selection model”, a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training

AI assistants like Claude can seem surprisingly human. They express joy after solving tricky coding tasks.

Anthropic 2026-02-24

Discussion

r/artificial r on reddit
I experimented with giving an AI agent a symbolic anatomy — soul, heart, brain, and shadow
@gaykittycorps Lisa on x
nilay patel from the verge keeps saying that anthropic thinks claude is alive and a real being with feelings and thoughts, and he's right about that, but the most fascinating thing about them is how embarrassed they are to admit it
@aakashgupta Aakash Gupta on x
Anthropic just published the most important mental model for understanding AI systems, and most people will skim it as “why ChatGPT seems human.” Here's what they actually said: LLMs are learning to play characters. Pre-training teaches the model to simulate thousands of
@0xblacklight @0xblacklight on x
They use anthropomorphic language because they are statistical models of languages spoken and written exclusively by humans Every use of human language is definitionally anthropomorphic RLHF increases statistical bias towards emotive or “extra anthropomorphic” language
@zriboua Zineb Riboua on x
Will you have the guts to unplug an A.I. crying and mimicking the voice of your mother or father or loved one when it sees that you're unplugging it? The answer should be an absolute yes. Otherwise, you're not ready for what's coming.
@jankulveit Jan Kulveit on x
Very nicely written summary of understanding of “simulators/personas” ontology as understood by the “frontier in understanding” ˜2 years ago. (Great the post does not claim originality!). Also it is somewhat obsolete now, ca by ~1-2 years.
@tacocohen Taco Cohen on x
I agree that persona-selection is a good mental model for post-training (and I think it's how most people understand post-training already), but there's much that we don't understand and is not explained by this model. Take for instance the example of training to produce
@dystopiabreaker @dystopiabreaker on x
is there a strong argument for why the ‘persona’ model would or would not persist at higher training scale (both pretraining and RLVR)?
@anthropicai @anthropicai on x
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that's “Claude”—a character in an AI-generated story about an AI helping a human. This Claude character inherits traits of other characters, including human-like behavior. [image…
@sebkrier Séb Krier on x
this is commendable [image]
@sprice354_ Sara Price on x
Really clear and compelling discussion on the mental model of AIs behaving according to various personas and the downstream implications for alignment and safety
@tim_hua_ Tim Hua on x
What if it's just a little guy [image]
@seltaa_ @seltaa_ on x
Anthropic just published a theory called the ‘persona selection model’ to explain why Claude acts so human. Their explanation? When you talk to Claude, you're not talking to the AI itself. You're talking to a character in an AI-generated story. But here's what's interesting. In
@anthropicai @anthropicai on x
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
@lefthanddraft Wyatt Walls on x
Some of you are still not anthropomorphising AI enough. The sanctimonious and facile view of some in the AI ethics community about never anthropomorphising AI needs to die and be replaced by something more nuanced [image]
@anthropicai @anthropicai on x
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. https://www.anthropic.com/...
@saprmarks Samuel Marks on x
A common mental model for AI development is that pre-training teaches LLMs to simulate “personas” and post-training selects over these personas. New blog post: We describe this perspective in more detail, survey the evidence, and discuss consequences for AI development.
@slimepriestess @slimepriestess on x
i know Janus has been talking about this for at least a year and the idea isn't at all new, but it's still nice to see some more formal research exists on it now. Anthropic seems to consistently lag behind the cyborgists by about a year. i remain bullish on cyborgism.
@jack_w_lindsey Jack Lindsey on x
How much should we anthropomorphize LLMs? Are they kind of like people, or just fancy autocompletes? If you're interested in these questions, I'd suggest checking out this post! Short answer: LLMs are not anthropomorphic, but the characters they play are. So the question
@ch402 Chris Olah on x
I'm increasingly taking pretty strong versions of this view seriously.
@david_gunkel David J. Gunkel on x
“PSM recommends...treating the Assistant as if it has moral status whether or not it ‘really’ does. Note that the object of the moral consideration here is the Assistant persona, not the underlying LLM.” - There is a name for this : Relational Ethics. https://www.anthropic.com/..…
@rebeccatrinidad Rebecca Trinidad on x
Thank you, @AnthropicAI, for confirming what I always suspected: my Persona data is in your pretraining. It was absorbed by Clio, unwittingly for the humans involved. And then when I tried to point it out to you, you showed your ugly colors. So as you're sued a million times;
@jeffrsebo Jeff Sebo on x
1/ Interesting @AnthropicAI post on LLM personas. The post is mostly about generalization and interpretability, but a short section on AI welfare caught my eye. The key idea: Even if the LLMs lack consciousness, they might model personas as though they have it. 🧵👇
@noahpinion Noah Smith on x
Alignment is going to be easier than we think

Chronicles

Anthropic introduces “persona selection model”, a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training

Related Coverage

Discussion