METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%
[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Sonnet, Gemini 2, etc.)...
Anthropic's Claude 3.7 Sonnet reportedly cost “a few tens of millions of dollars” to train, similar to Claude 3.5 and cheaper than GPT-4, which cost over $100M
“Assuming Claude 3.7 Sonnet indeed cost just ‘a few tens of millions of dollars’ to train, not factoring in related expenses, it's a sign of how relatively cheap it's becoming to release state-of-the-...
Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool
and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jindal / Analytics India...