2025-05-24
the press? lol this is not the way
@sleepinyourhat
Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data
It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...
2025-05-23
the press? lol this is not the way
@sleepinyourhat
Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data
It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...
the press? lol this is not the way
TechCrunch
Anthropic's System Card: Opus 4 often attempted to blackmail engineers by threatening to reveal sensitive personal info when it was threatened with replacement
Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace …
2023-10-09
For people who worry about the “black box” nature of AI, this is huge. 👏
Anthropic
A research paper details how decomposing groups of neural network neurons into “interpretable features” may improve safety by enabling the monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …
2023-10-08
For people who worry about the “black box” nature of AI, this is huge. 👏
Anthropic
A research paper details how decomposing groups of neurons in a neural network into interpretable “features” may improve safety by enabling monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …