2025-10-03
most benchmarks today don't measure model proficiency on realistic tasks. it's time to start *actually* measuring the frontier towards automating economically-valuable work. see the full paper here: https://arxiv.org/...
Mercor
Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads at 64.2%
still not production-ready Nikita Ostrovsky / Time : AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants arXiv.org : The AI Productivity Index (APEX) Agnee Ghosh / B...