2025-11-05
Windows On Theory
Research: AI's ability to complete lengthy software engineering tasks has doubled roughly every six months, but there is a “messiness tax” for real-world tasks
METR has had a very influential work by Kwa and West et al on measuring AI's ability to complete long tasks. X: @kirillzzy , @boazbaraktcs , @benshindel , @jasonfurman , @jasonfurman , and @sama X: Ki...
Loading articles...