Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads at 64.2%

still not production-ready Nikita Ostrovsky / Time : AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants arXiv.org : The AI Productivity Index (APEX) Agnee Ghosh / Bloomberg : OpenAI, Anthropic Highlight AI's Economic Value as Doubts Grow X: Martin Borch Jensen / @martinbjensen : Now lets do this for measuring biological age. 13 years of aging clocks and we still aren't benchmarking predictive power. All the pieces are there, just need the equiv of 1-2y of @bryan_johnson health budget. Amjad Hamza / @amjadhamz : First models came for high school math tutors and stackexchange posters and I said nothing for I was not a high school math tutor or stackexchange poster, then they came for bankers and consultants... Debnil Sur / @debnilsur : Really recommend checking out the APEX launch—this is an incredibly dense set of tasks curated and approved by the world's leading experts. Models have improved by a shocking amount in the last few years, but there's so much more to come. The economic impact will be staggering. Nancy Fairbank / @nancyafairbank : Huge shoutout to my incredible colleagues, who have worked extremely hard on APEX this year - super fascinating work evaluating models on consulting, legal, financial, and medical tasks And so fun to see that my former HLS Professor, @CassSunstein, collaborated with @mercor_ai Logan Watchorn / @logan_watchorn : With APEX, Mercor has created a bleeding-edge evaluation platform for AI's ability to do work in four of the most cognitively intensive fields on the planet - medicine, law, consulting, and banking Jason Zhu / @jayyzhu : Read a @LHSummers quote in the newspaper at the airport on the way over by chance, then saw him IRL filming for @mercor_ai and got to hear him break things down live. Super cool. [image] Lenny Rachitsky / @lennysan : Interesting: The models most capable of doing human-level work today, according to @mercor_ai: 🥇 GPT 5 🥈 Grok 4 🥉 Gemini 2.5 Flash Calix / @calixo888 : most benchmarks today don't measure model proficiency on realistic tasks. it's time to start *actually* measuring the frontier towards automating economically-valuable work. see the full paper here: https://arxiv.org/... Osvald Nitski / @osvaldnitski : Exciting to see coverage of APEX and the larger shifts happening in the human data industry in TIME as well! https://time.com/... Peter Fenton / @peterfenton : Lord Kelvin said it best : “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” Congrats to Yash Patil / @ypatil125 : Evaluations like APEX are HUGELY important for guiding the direction of AI research and measuring the impact it will have on the economy. Mercor is leading the charge here in a big way. Huge congrats to @BrendanFoody and the entire team. Brendan / @brendanfoody : We collaborated with the world's leading experts to create APEX: - Larry Summers (@LHSummers), former US Treasury Secretary - Cass Sunstein (@CassSunstein), the most cited legal scholar - Eric Topol (@EricTopol), physician and best-selling author - Dominic Barton, former [image] Anna Monaco / @annarmonaco : One of the most interesting benchmarks out there - turns out that many models you wouldn't expect are incredibly economically valuable. Congrats @BrendanFoody and @mercor_ai team Osvald Nitski / @osvaldnitski : Excited to share Mercor's first benchmark! The team is already hard at work expanding the richness of this eval for the next iteration and including even more valuable job categories such as peptide dealer and chief of staff. See the full paper here: https://arxiv.org/... Bill Gurley / @bgurley : If you are interested in knowing in which industries AI is most successful in understanding (& which models), you should pay close attention to the APEX. More below. Brendan / @brendanfoody : AI has its PhD and now it's on the job market. Introducing the AI Productivity Index (APEX), a benchmark that measures how well we've automated the most valuable industries in the world. Most benchmarks study abstract capabilities. APEX evaluates model performance on real deliverables across law, finance, consulting, and medicine... LinkedIn: Shakhlo Nematova, PhD : As I wrap up my time at Mercor and prepare for my next role, I want to take a moment to reflect on what an incredible experience it has been. …

Mercor 2025-10-03

Discussion

@martinbjensen Martin Borch Jensen on x
Now lets do this for measuring biological age. 13 years of aging clocks and we still aren't benchmarking predictive power. All the pieces are there, just need the equiv of 1-2y of @bryan_johnson health budget.
@amjadhamz Amjad Hamza on x
First models came for high school math tutors and stackexchange posters and I said nothing for I was not a high school math tutor or stackexchange poster, then they came for bankers and consultants...
@debnilsur Debnil Sur on x
Really recommend checking out the APEX launch—this is an incredibly dense set of tasks curated and approved by the world's leading experts. Models have improved by a shocking amount in the last few years, but there's so much more to come. The economic impact will be staggering.
@nancyafairbank Nancy Fairbank on x
Huge shoutout to my incredible colleagues, who have worked extremely hard on APEX this year - super fascinating work evaluating models on consulting, legal, financial, and medical tasks And so fun to see that my former HLS Professor, @CassSunstein, collaborated with @mercor_ai
@logan_watchorn Logan Watchorn on x
With APEX, Mercor has created a bleeding-edge evaluation platform for AI's ability to do work in four of the most cognitively intensive fields on the planet - medicine, law, consulting, and banking
@jayyzhu Jason Zhu on x
Read a @LHSummers quote in the newspaper at the airport on the way over by chance, then saw him IRL filming for @mercor_ai and got to hear him break things down live. Super cool. [image]
@lennysan Lenny Rachitsky on x
Interesting: The models most capable of doing human-level work today, according to @mercor_ai: 🥇 GPT 5 🥈 Grok 4 🥉 Gemini 2.5 Flash
@calixo888 Calix on x
most benchmarks today don't measure model proficiency on realistic tasks. it's time to start *actually* measuring the frontier towards automating economically-valuable work. see the full paper here: https://arxiv.org/...
@osvaldnitski Osvald Nitski on x
Exciting to see coverage of APEX and the larger shifts happening in the human data industry in TIME as well! https://time.com/...
@peterfenton Peter Fenton on x
Lord Kelvin said it best : “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” Congrats to
@ypatil125 Yash Patil on x
Evaluations like APEX are HUGELY important for guiding the direction of AI research and measuring the impact it will have on the economy. Mercor is leading the charge here in a big way. Huge congrats to @BrendanFoody and the entire team.
@brendanfoody Brendan on x
We collaborated with the world's leading experts to create APEX: - Larry Summers (@LHSummers), former US Treasury Secretary - Cass Sunstein (@CassSunstein), the most cited legal scholar - Eric Topol (@EricTopol), physician and best-selling author - Dominic Barton, former [image]
@annarmonaco Anna Monaco on x
One of the most interesting benchmarks out there - turns out that many models you wouldn't expect are incredibly economically valuable. Congrats @BrendanFoody and @mercor_ai team
@osvaldnitski Osvald Nitski on x
Excited to share Mercor's first benchmark! The team is already hard at work expanding the richness of this eval for the next iteration and including even more valuable job categories such as peptide dealer and chief of staff. See the full paper here: https://arxiv.org/...
@bgurley Bill Gurley on x
If you are interested in knowing in which industries AI is most successful in understanding (& which models), you should pay close attention to the APEX. More below.
@brendanfoody Brendan on x
AI has its PhD and now it's on the job market. Introducing the AI Productivity Index (APEX), a benchmark that measures how well we've automated the most valuable industries in the world. Most benchmarks study abstract capabilities. APEX evaluates model performance on real deli…

Chronicles

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform “economically valuable knowledge work”; GPT-5 leads at 64.2%

Related Coverage

Discussion