2025-12-21
progress on monitoring AI agents by looking at their chains of thought was bottlenecked by lack of measures of monitorability we could *really* trust. this paper fills this hole and might be one of the most important pieces of AI safety research in 2025. [image]
OpenAI
OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system
2025-12-20
progress on monitoring AI agents by looking at their chains of thought was bottlenecked by lack of measures of monitorability we could *really* trust. this paper fills this hole and might be one of the most important pieces of AI safety research in 2025. [image]
OpenAI
OpenAI introduces a framework to evaluate chain-of-thought monitorability and a suite of 13 evaluations designed to measure the monitorability of an AI system
We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.
2025-07-16
The holy grail of AI safety has always been interpretability. But what if reasoning models just handed it to us in a stroke of serendipity? In our new paper, we argue that the AI community should turn this serendipity into a systematic AI safety agenda!🛡️
TechCrunch
In a paper, AI researchers from OpenAI, Google DeepMind, Anthropic, and others recommend “further research into chain-of-thought monitorability” for AI safety
AI researchers from OpenAI, Google DeepMind, Anthropic, and a broad coalition of companies and nonprofit groups …