ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning rather than memory recall

ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still struggle to do.

Fast Company 2026-03-25 Mark Sullivan

Discussion

@jeremyphoward Jeremy Howard on x
These games are kinda fun to play! :)
@deryatr_ Derya Unutmaz on x
ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2
@fchollet François Chollet on x
Keep in mind: ARC-AGI is *not* a final exam that you pass to claim AGI. Including ARC-AGI-3. The benchmarks target the residual gap between what's hard for AI and what's easy for humans. It's meant to be a tool to measure AGI progress and to drive researchers towards the most
@andykonwinski Andy Konwinski on x
ARC-AGI-3 benchmark: - 100% solvable by humans - 1% solvable by AI Everybody keep building benchmarks that agents utterly fail at! Proud this was a Laude Slingshot; will fund other benchmarks that reset SotA to 1%: https://www.laude.org/...
@gregkamradt Greg Kamradt on x
The 25 public ARC-AGI-3 games On average, they are easier for humans and AI However, the difficulty ranges, there are very easy games and games which are more difficult Easy for AI: https://arcprize.org/... Hard for AI: https://arcprize.org/...
@fchollet François Chollet on x
ARC-AGI-3 scores agents on how close they are to human action efficiency. All ARC-AGI-3 environments were solved by at least 2 human testers out of 10 (most of the time it was 5+). We use the action count of the 2nd best tester (to avoid outlier performance) as our human
@thezvi Zvi Mowshowitz on x
The real metric is ‘time until he is pleased to announce ARC-AGI-4, the only unsaturated agentic AI benchmark.’
@emollick Ethan Mollick on x
ARC-AGI-3 took me a few tries, but it is definitely human winnable. I am curious how much of the very initially very low performance of frontier models is harness, vision, and tools, versus how much are limitations of LLMs. I guess we will find out! https://arcprize.org/...
@iruletheworldmo @iruletheworldmo on x
if we saturate arc 3 this year and there's no meaningful shift in the economy, it's clear benchmarks have become a gimmick between labs and providers to hill climb, market, make bread. whilst it's exciting to see a benchmark the models perform so poorly at. and i love this
@fchollet François Chollet on x
At the moment, ARC-AGI-3 is the only unsaturated agentic AI benchmark. Sub-1% scores from frontier models on the private test set. If you want to be among the first to know when an AGI breakthrough happens, monitor the ARC-AGI-3 leaderboard. Any sudden score jump will mean
@mikeknoop Mike Knoop on x
ARC-AGI-3 and ARC Prize 2026 are now live with $2,000,000 in prizes! As of today, version 3 is the world's only unsaturated agentic intelligence benchmark. Humans score 100% and frontier AI scores ~0%. Play here: https://arcprize.org/... While no single version of ARC is
@fchollet François Chollet on x
ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first [v…
@bryanlanders Bryan Landers on x
It's alive! This 3rd version of ARC-AGI represents an incredible amount of work from the ARC Prize team. Hundreds of games. Thousands of levels. Go build agents!
@gregkamradt Greg Kamradt on x
Today we're launching ARC-AGI-3 135 Novel Environments (nearly 1K levels) we build by hand It is the only unsaturated agent benchmark in the world Each game is 100% human solvable, AI scores <1% This gap between human and AI performance proves we do not have AGI Agents today [vid…
@arcprize @arcprize on x
We created an in-house game studio and built 135 novel environments from scratch No instructions, Core Knowledge Priors-only In order to beat these games, AI must: • Explore the environment • Form hypotheses • Execute a plan • Learn and adapt [image]
@arcprize @arcprize on x
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn [image]
@scaling01 @scaling01 on x
ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0% [image]

Chronicles

ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning rather than memory recall

Discussion