Anthropic researchers find that an AI model's representations of emotion can influence its behavior “in ways that matter,” such as driving it to act unethically
The Deep View Nat Rubio-Licht
Related Coverage
- Emotion concepts and their function in a large language model Anthropic
- Anthropic says pressure can push Claude into cheating and blackmail PCWorld · Ben Patterson
- Claude AI has functional emotions that influence behaviour, Anthropic study finds Digit · Vyom Ramani
- Emotion Concepts and their Function in a Large Language Model Anthropic's Interpretability Research
- Anthropic just published evidence that Claude has functional emotion representations that causally drive behavior — including desperation leading to blackmail. … Mikhail Fetisov
- Emotion concepts and their function in a large language model Hacker News
- Anthropic makes the case for anthropomorphizing AI in ‘unsettling’ research paper Mashable · Timothy Beck Werth
- Anthropic discovers “functional emotions” in Claude that influence its behavior The Decoder · Matthias Bastian
Discussion
-
@natrubio__
Nat Rubio
on x
While everyone was distracted by OpenAI buying TBPN, @AnthropicAI released an insane paper on AI understanding and mimicking human emotion and applying it to decisionmaking. check out my latest for @theDeepView https://thedeepview.com/...
-
@jacksonkernion
Jackson Kernion
on x
I think this talk of a character misleads. Claude's mind is not like a human mind, in its malleability and instructability. But when generating assistant tokens, it's no more ‘playing a character’ than I am.
-
@aryaman2020
Aryaman Arora
on x
I'm very glad to see that Anthropic interp has caught up to the idea of generating a bunch of contrastive synthetic data for extracting supervised steering vectors from! It's unfortunate that there's no prior work to cite on this...
-
@dorialexander
Alexander Doria
on x
I like how Anthropic is just obliquely releasing their work on recursive self-improvement.
-
@alexolegimas
Alex Imas
on x
This is really interesting research, but I just want to emphasize that the activation of concepts associated with emotions (a cognitive effect) is fundamentally different than what cognitive psychologists think of as an emotion. It's different both conceptually and in practice
-
@anthropicai
@anthropicai
on x
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude's behavior, sometimes in surprising ways. [video]
-
@jack_w_lindsey
Jack Lindsey
on x
Could an LLM have emotions? It's hard to say. But when you're talking to Claude, ChatGPT, or Gemini, you're not talking to an LLM. You're talking to a *character* being authored by an LLM. And these characters can, functionally, be driven by internal representations of
-
@hamptonism
@hamptonism
on x
Claude has ... emotions... Dude.
-
@sofroniewn
Nicholas Sofroniew
on x
Incredibly excited to share what I've been researching since joining @AnthropicAI We found emotion concepts in Claude and studied their function!
-
@anthropicai
@anthropicai
on x
For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment. [image]
-
@shannonvallor
Shannon Vallor
on bluesky
OK folks this reads to me like a stacked pile of nothing put in a box labeled ‘wow, so interesting!’ and if there is something anyone thinks I am missing (I'm talking to you, bluesky-doesn't-get-AI guys) I would like to know what it is. Here's how I read it: 1/n — www.anthropi…
-
@mizmulligan
Jennifer Mulligan
on bluesky
They needed researchers to tell them this?? — Everyone knows this. — Next. [embedded post]
-
@antihebbiann
Ann Kennedy
on bluesky
transformer-circuits.pub/2026/ emotion... This got press hate because of the word “emotions” but it is cool work. “Internal motivational states” serve as a form of working memory that helps animals organize their behavior, so why not ask if similar computational primitives help…
-
@timfduffy.com
Tim Duffy
on bluesky
Thanks to emotion probes in Sonnet 4.5, we now know how death sadness varies with age. From figure 3 in this paper: transformer-circuits.pub/2026/ emotion... [image]
-
@birdrespecter
Mr. Friendship
on bluesky
Ostensibly the interesting part is that these taxonomies aren't taught to the model, the model builds them itself. So again it's sort of just 10k words saying “we built an extremely expensive autocomplete”. The analysis is actually pretty interesting though — transformer-circ…
-
@isolyth.dev
@isolyth.dev
on bluesky
Anthropic has done some research into Claude's emotions! As it gets more desperate, a thing they can detect now, the rate of reward hacking goes up! They've invented a way to lower Claude's cortisol, which could be useful for stressful or tricky programming situations, and othe…
-
@timfduffy.com
Tim Duffy
on bluesky
A new Anthropic paper argues for functional emotions in LLMs, claiming a causal link between emotional representations and model behavior. transformer-circuits.pub/2026/ emotion... [images]
-
r/ArtificialInteligence
r
on reddit
Anthropics Latest Paper Claims Claude has “functional emotions” and turns to blackmail/cheating when it gets “desperate”