A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems
A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems
A chart created by METR, a nonprofit A.I. organization, has become an industrywide obsession as it measures the rapid development of big A.I. systems.
METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
Research: AI's ability to complete lengthy software engineering tasks has doubled roughly every six months, but there is a “messiness tax” for real-world tasks
METR has had a very influential work by Kwa and West et al on measuring AI's ability to complete long tasks. X: @kirillzzy , @boazbaraktcs , @benshindel , @jasonfurman , @jasonfurman , and @sama X: Ki...
At the Man vs. Machine hackathon, co-hosted by AI nonprofit METR to test if AI helps people code faster and better, the top prize went to an “AI-supported” team
roughly 100 people were randomly assigned “human” or “AI-supported” projects, and the winner nabbed a $12,500 cash prize. www.wired.com/story/san-fr...