2025-12-22
@metr_evals
4 related
METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
2025-09-10
Wired
1 related
At the Man vs. Machine hackathon, co-hosted by AI nonprofit METR to test if AI helps people code faster and better, the top prize went to an “AI-supported” team
roughly 100 people were randomly assigned “human” or “AI-supported” projects, and the winner nabbed a $12,500 cash prize. www.wired.com/story/san-fr...
Loading articles...