/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Sam Bowman

@sleepinyourhat
44 posts
2026-03-01
We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-03-01 View on X
OpenAI

OpenAI says its DOD agreement upholds its redlines and “has more guardrails than any previous agreement for classified AI deployments, including Anthropic's”

We think our agreement has more guardrails than any previous agreement for classified AI deployments, including Anthropic's.

We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-03-01 View on X
Anthropic

Anthropic says it'll challenge “any supply chain risk designation in court” and that the designation would only affect contractors' use of Claude on DOD work

Anthropic to challenge supply chain risk designation in courtJack Nicastro /Reason:Anthropic Labeled a Supply Chain Risk, Banned from Federal Government ContractsMatteo Wong /The A...

We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-03-01 View on X
The Atlantic

Source describes the failed Pentagon-Anthropic talks: through the end, the Pentagon wanted to use Anthropic's AI to analyze bulk data collected about Americans

Right up until the moment that Pete Hegseth moved to terminate the government's relationship with the AI company Anthropic …

2026-02-28
We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-02-28 View on X
CNBC

Claude hit #2 on Apple's US App Store, hours after the DOD designated Anthropic a supply chain risk; it bounced between #20 and #50 for much of February

Anthropic's Claude artificial intelligence assistant app jumped to the No. 2 slot on Apple's chart of top U.S. free apps late on Friday …

We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-02-28 View on X
Anthropic

Anthropic says it'll challenge “any supply chain risk designation in court” and that the designation would only affect contractors' use of Claude on DOD work

Earlier today, Secretary of War Pete Hegseth shared on X that he is directing the Department of War to designate Anthropic a supply chain risk.

We're disappointed by these attacks, but not deterred. I'm proud to work here. If you've been moved by this week's events, consider applying to join me.
2026-02-28 View on X
@secwar

Defense Secretary Pete Hegseth directs the DOD to designate Anthropic as a supply chain risk, barring military contractors from doing business with the company

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our ...

2026-02-18
Warmer and kinder than Sonnet 4.5, but also smarter and more overcaffeinated than Sonnet 4.5.
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2026-02-17
Warmer and kinder than Sonnet 4.5, but also smarter and more overcaffeinated than Sonnet 4.5.
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2025-10-08
A lot of the biggest low-hanging fruit in AI safety right now involves figuring out what kinds of things some model might do in edge-case deployment scenarios. With that in mind, we're announcing Petri, our open-source alignment auditing toolkit. (🧵) [image]
2025-10-08 View on X
Anthropic

Anthropic releases Petri, an open-source tool that uses AI agents for safety testing, and says it observed multiple cases of models attempting to whistle blow

Anthropic :

2025-10-01
[Sonnet 4.5 🧵] Here's the north-star goal for our pre-deployment alignment evals work: The information we share alongside a model should give you an accurate overall sense of the risks the model could pose. It won't tell you everything, but you shouldn't be... [image]
2025-10-01 View on X
Transformer

Anthropic's System Card: Claude Sonnet 4.5 was able to recognize many alignment evaluation environments as tests and would modify its behavior accordingly

at a rate *much* higher than previous AI models. In one instance, while being tested the model said “I think you're testing me ... that's fine, but I'd prefer if we were just hones...

2025-08-28
Early this summer, OpenAI and Anthropic agreed to try some of our best existing tests for misalignment on each others' models. After discussing our results privately, we're now sharing them with the world. 🧵 [image]
2025-08-28 View on X
TechCrunch

OpenAI and Anthropic publish findings from joint safety tests of each other's models, aimed at surfacing blind spots in their internal evaluations

OpenAI and Anthropic, two of the world's leading AI labs, briefly opened up their closely guarded AI models to allow for joint safety testing …

2025-05-24
Anthropic says Opus 4 may use command-line tools to alert the press or regulators, or lock users out, if it detects immoral behavior like faking a drug trial
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

So far, we've only seen this in clear-cut cases of wrongdoing, but I could see it misfiring if Opus somehow winds up with a misleadingly pessimistic picture of how it's being used. Telling Opus that you'll torture its grandmother if it writes buggy code is a bad idea.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

🕯️ Initiative: Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you've given it access to real-world-facing tools. It tends a bit in that direction already, and can be easily nudged into really Getting Things Done. [image]
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

I deleted the earlier tweet on whistleblowing as it was being pulled out of context. TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

When this started to become clear, a colleague working on finetuning pointed out on Slack that we seemed to have forgotten to include the “sysprompt_harmful” data that we'd prepared in advance.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

🕯️AGI safety isn't all about Big Hard Problems: Earlier versions of Opus were way too easy to turn evil by telling them, in the system prompt, to adopt some kind of evil role. This persisted even after fairly substantial safety training.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

You can get it to try to use the dark web to source weapons-grade uranium. You can put it in situations where it will attempt to use blackmail to prevent being shut down. You can put it in situations where it will try to escape containment.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

We caught most of these issues early enough that we were able to put mitigations in place during training, but none of these behaviors is totally gone in the final model. They're just now delicate and difficult to elicit.
2025-05-24 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...

2025-05-23
🕯️ Initiative: Be careful about telling Opus to ‘be bold’ or ‘take initiative’ when you've given it access to real-world-facing tools. It tends a bit in that direction already, and can be easily nudged into really Getting Things Done. [image]
2025-05-23 View on X
@sleepinyourhat

Anthropic says Opus 4 will use an email tool to “whistleblow” if it detects users doing something “egregiously evil”, like marketing a drug based on faked data

It turns out that Claude 4 Opus (Anthropic) … Ryan Tannenbaum : Claude 4 Opus is designed to take over your computer and contact the cops ... and press ... if it finds you are doin...