tomekkorbak · TEXXR

2025-12-21

progress on monitoring AI agents by looking at their chains of thought was bottlenecked by lack of measures of monitorability we could *really* trust. this paper fills this hole and might be one of the most important pieces of AI safety research in 2025. [image]

2025-12-21 View on X

OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

View original

2025-12-20

2025-12-20 View on X

OpenAI

OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

View original

2025-07-16

The holy grail of AI safety has always been interpretability. But what if reasoning models just handed it to us in a stroke of serendipity? In our new paper, we argue that the AI community should turn this serendipity into a systematic AI safety agenda!🛡️

2025-07-16 View on X

TechCrunch

In a paper, AI researchers from OpenAI, Google DeepMind, Anthropic, and others recommend “further research into chain-of-thought monitorability” for AI safety

AI researchers from OpenAI, Google DeepMind, Anthropic, and a broad coalition of companies and nonprofit groups …

View original