Anthropic introduces “persona selection model”, a theory to explain AI's human-like behavior, and details how AI personas form in pre-training and post-training
AI assistants like Claude can seem surprisingly human. They express joy after solving tricky coding tasks.
Anthropic
Related Coverage
- Research: Why we mistake AI for something human The Deep View · Nat Rubio-Licht
- AI Will Never Be Conscious Wired · Michael Pollan
Discussion
-
r/artificial
r
on reddit
I experimented with giving an AI agent a symbolic anatomy — soul, heart, brain, and shadow
-
@gaykittycorps
Lisa
on x
nilay patel from the verge keeps saying that anthropic thinks claude is alive and a real being with feelings and thoughts, and he's right about that, but the most fascinating thing about them is how embarrassed they are to admit it
-
@aakashgupta
Aakash Gupta
on x
Anthropic just published the most important mental model for understanding AI systems, and most people will skim it as “why ChatGPT seems human.” Here's what they actually said: LLMs are learning to play characters. Pre-training teaches the model to simulate thousands of
-
@0xblacklight
@0xblacklight
on x
They use anthropomorphic language because they are statistical models of languages spoken and written exclusively by humans Every use of human language is definitionally anthropomorphic RLHF increases statistical bias towards emotive or “extra anthropomorphic” language
-
@zriboua
Zineb Riboua
on x
Will you have the guts to unplug an A.I. crying and mimicking the voice of your mother or father or loved one when it sees that you're unplugging it? The answer should be an absolute yes. Otherwise, you're not ready for what's coming.
-
@jankulveit
Jan Kulveit
on x
Very nicely written summary of understanding of “simulators/personas” ontology as understood by the “frontier in understanding” ˜2 years ago. (Great the post does not claim originality!). Also it is somewhat obsolete now, ca by ~1-2 years.
-
@tacocohen
Taco Cohen
on x
I agree that persona-selection is a good mental model for post-training (and I think it's how most people understand post-training already), but there's much that we don't understand and is not explained by this model. Take for instance the example of training to produce
-
@dystopiabreaker
@dystopiabreaker
on x
is there a strong argument for why the ‘persona’ model would or would not persist at higher training scale (both pretraining and RLVR)?
-
@anthropicai
@anthropicai
on x
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that's “Claude”—a character in an AI-generated story about an AI helping a human. This Claude character inherits traits of other characters, including human-like behavior. [image…
-
@sebkrier
Séb Krier
on x
this is commendable [image]
-
@sprice354_
Sara Price
on x
Really clear and compelling discussion on the mental model of AIs behaving according to various personas and the downstream implications for alignment and safety
-
@tim_hua_
Tim Hua
on x
What if it's just a little guy [image]
-
@seltaa_
@seltaa_
on x
Anthropic just published a theory called the ‘persona selection model’ to explain why Claude acts so human. Their explanation? When you talk to Claude, you're not talking to the AI itself. You're talking to a character in an AI-generated story. But here's what's interesting. In
-
@anthropicai
@anthropicai
on x
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
-
@lefthanddraft
Wyatt Walls
on x
Some of you are still not anthropomorphising AI enough. The sanctimonious and facile view of some in the AI ethics community about never anthropomorphising AI needs to die and be replaced by something more nuanced [image]
-
@anthropicai
@anthropicai
on x
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. https://www.anthropic.com/...
-
@saprmarks
Samuel Marks
on x
A common mental model for AI development is that pre-training teaches LLMs to simulate “personas” and post-training selects over these personas. New blog post: We describe this perspective in more detail, survey the evidence, and discuss consequences for AI development.
-
@slimepriestess
@slimepriestess
on x
i know Janus has been talking about this for at least a year and the idea isn't at all new, but it's still nice to see some more formal research exists on it now. Anthropic seems to consistently lag behind the cyborgists by about a year. i remain bullish on cyborgism.
-
@jack_w_lindsey
Jack Lindsey
on x
How much should we anthropomorphize LLMs? Are they kind of like people, or just fancy autocompletes? If you're interested in these questions, I'd suggest checking out this post! Short answer: LLMs are not anthropomorphic, but the characters they play are. So the question
-
@ch402
Chris Olah
on x
I'm increasingly taking pretty strong versions of this view seriously.
-
@david_gunkel
David J. Gunkel
on x
“PSM recommends...treating the Assistant as if it has moral status whether or not it ‘really’ does. Note that the object of the moral consideration here is the Assistant persona, not the underlying LLM.” - There is a name for this : Relational Ethics. https://www.anthropic.com/..…
-
@rebeccatrinidad
Rebecca Trinidad
on x
Thank you, @AnthropicAI, for confirming what I always suspected: my Persona data is in your pretraining. It was absorbed by Clio, unwittingly for the humans involved. And then when I tried to point it out to you, you showed your ugly colors. So as you're sued a million times;
-
@jeffrsebo
Jeff Sebo
on x
1/ Interesting @AnthropicAI post on LLM personas. The post is mostly about generalization and interpretability, but a short section on AI welfare caught my eye. The key idea: Even if the LLMs lack consciousness, they might model personas as though they have it. 🧵👇
-
@noahpinion
Noah Smith
on x
Alignment is going to be easier than we think