Anthropic researchers find that an AI model's representations of emotion can influence its behavior “in ways that matter,” such as driving it to act unethically
Can we teach machines to feel? Short answer: We don't know. But we can teach them to sound like they do.
The Deep View Nat Rubio-Licht
Related Coverage
- Emotion concepts and their function in a large language model Anthropic
- Emotion Concepts and their Function in a Large Language Model Anthropic's Interpretability Research
- Claude AI has functional emotions that influence behaviour, Anthropic study finds Digit · Vyom Ramani
- Anthropic just published evidence that Claude has functional emotion representations that causally drive behavior — including desperation leading to blackmail. … Mikhail Fetisov
- Anthropic says pressure can push Claude into cheating and blackmail PCWorld · Ben Patterson
Discussion
-
@anthropicai
@anthropicai
on x
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude's behavior, sometimes in surprising ways. [video]
-
@aryaman2020
Aryaman Arora
on x
I'm very glad to see that Anthropic interp has caught up to the idea of generating a bunch of contrastive synthetic data for extracting supervised steering vectors from! It's unfortunate that there's no prior work to cite on this...
-
@natrubio__
Nat Rubio
on x
While everyone was distracted by OpenAI buying TBPN, @AnthropicAI released an insane paper on AI understanding and mimicking human emotion and applying it to decisionmaking. check out my latest for @theDeepView https://thedeepview.com/...
-
@jacksonkernion
Jackson Kernion
on x
I think this talk of a character misleads. Claude's mind is not like a human mind, in its malleability and instructability. But when generating assistant tokens, it's no more ‘playing a character’ than I am.
-
@hamptonism
@hamptonism
on x
Claude has ... emotions... Dude.
-
@dorialexander
Alexander Doria
on x
I like how Anthropic is just obliquely releasing their work on recursive self-improvement.
-
@jack_w_lindsey
Jack Lindsey
on x
Could an LLM have emotions? It's hard to say. But when you're talking to Claude, ChatGPT, or Gemini, you're not talking to an LLM. You're talking to a *character* being authored by an LLM. And these characters can, functionally, be driven by internal representations of
-
@sofroniewn
Nicholas Sofroniew
on x
Incredibly excited to share what I've been researching since joining @AnthropicAI We found emotion concepts in Claude and studied their function!
-
@anthropicai
@anthropicai
on x
For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment. [image]
-
@alexolegimas
Alex Imas
on x
This is really interesting research, but I just want to emphasize that the activation of concepts associated with emotions (a cognitive effect) is fundamentally different than what cognitive psychologists think of as an emotion. It's different both conceptually and in practice
-
@timfduffy.com
Tim Duffy
on bluesky
Thanks to emotion probes in Sonnet 4.5, we now know how death sadness varies with age. From figure 3 in this paper: transformer-circuits.pub/2026/ emotion... [image]
-
@antihebbiann
Ann Kennedy
on bluesky
transformer-circuits.pub/2026/ emotion... This got press hate because of the word “emotions” but it is cool work. “Internal motivational states” serve as a form of working memory that helps animals organize their behavior, so why not ask if similar computational primitives help…
-
@mizmulligan
Jennifer Mulligan
on bluesky
They needed researchers to tell them this?? — Everyone knows this. — Next. [embedded post]
-
@birdrespecter
Mr. Friendship
on bluesky
Ostensibly the interesting part is that these taxonomies aren't taught to the model, the model builds them itself. So again it's sort of just 10k words saying “we built an extremely expensive autocomplete”. The analysis is actually pretty interesting though — transformer-circ…
-
@isolyth.dev
@isolyth.dev
on bluesky
Anthropic has done some research into Claude's emotions! As it gets more desperate, a thing they can detect now, the rate of reward hacking goes up! They've invented a way to lower Claude's cortisol, which could be useful for stressful or tricky programming situations, and othe…
-
@timfduffy.com
Tim Duffy
on bluesky
A new Anthropic paper argues for functional emotions in LLMs, claiming a causal link between emotional representations and model behavior. transformer-circuits.pub/2026/ emotion... [images]
-
@shannonvallor
Shannon Vallor
on bluesky
OK folks this reads to me like a stacked pile of nothing put in a box labeled ‘wow, so interesting!’ and if there is something anyone thinks I am missing (I'm talking to you, bluesky-doesn't-get-AI guys) I would like to know what it is. Here's how I read it: 1/n — www.anthropi…
-
r/ArtificialInteligence
r
on reddit
Anthropics Latest Paper Claims Claude has “functional emotions” and turns to blackmail/cheating when it gets “desperate”