ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning rather than memory recall
ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still struggle to do.
Fast Company Mark Sullivan
Discussion
-
@jeremyphoward
Jeremy Howard
on x
These games are kinda fun to play! :)
-
@deryatr_
Derya Unutmaz
on x
ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2
-
@fchollet
François Chollet
on x
Keep in mind: ARC-AGI is *not* a final exam that you pass to claim AGI. Including ARC-AGI-3. The benchmarks target the residual gap between what's hard for AI and what's easy for humans. It's meant to be a tool to measure AGI progress and to drive researchers towards the most
-
@andykonwinski
Andy Konwinski
on x
ARC-AGI-3 benchmark: - 100% solvable by humans - 1% solvable by AI Everybody keep building benchmarks that agents utterly fail at! Proud this was a Laude Slingshot; will fund other benchmarks that reset SotA to 1%: https://www.laude.org/...
-
@gregkamradt
Greg Kamradt
on x
The 25 public ARC-AGI-3 games On average, they are easier for humans and AI However, the difficulty ranges, there are very easy games and games which are more difficult Easy for AI: https://arcprize.org/... Hard for AI: https://arcprize.org/...
-
@fchollet
François Chollet
on x
ARC-AGI-3 scores agents on how close they are to human action efficiency. All ARC-AGI-3 environments were solved by at least 2 human testers out of 10 (most of the time it was 5+). We use the action count of the 2nd best tester (to avoid outlier performance) as our human
-
@thezvi
Zvi Mowshowitz
on x
The real metric is ‘time until he is pleased to announce ARC-AGI-4, the only unsaturated agentic AI benchmark.’
-
@emollick
Ethan Mollick
on x
ARC-AGI-3 took me a few tries, but it is definitely human winnable. I am curious how much of the very initially very low performance of frontier models is harness, vision, and tools, versus how much are limitations of LLMs. I guess we will find out! https://arcprize.org/...
-
@iruletheworldmo
@iruletheworldmo
on x
if we saturate arc 3 this year and there's no meaningful shift in the economy, it's clear benchmarks have become a gimmick between labs and providers to hill climb, market, make bread. whilst it's exciting to see a benchmark the models perform so poorly at. and i love this
-
@fchollet
François Chollet
on x
At the moment, ARC-AGI-3 is the only unsaturated agentic AI benchmark. Sub-1% scores from frontier models on the private test set. If you want to be among the first to know when an AGI breakthrough happens, monitor the ARC-AGI-3 leaderboard. Any sudden score jump will mean
-
@mikeknoop
Mike Knoop
on x
ARC-AGI-3 and ARC Prize 2026 are now live with $2,000,000 in prizes! As of today, version 3 is the world's only unsaturated agentic intelligence benchmark. Humans score 100% and frontier AI scores ~0%. Play here: https://arcprize.org/... While no single version of ARC is
-
@fchollet
François Chollet
on x
ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first [v…
-
@bryanlanders
Bryan Landers
on x
It's alive! This 3rd version of ARC-AGI represents an incredible amount of work from the ARC Prize team. Hundreds of games. Thousands of levels. Go build agents!
-
@gregkamradt
Greg Kamradt
on x
Today we're launching ARC-AGI-3 135 Novel Environments (nearly 1K levels) we build by hand It is the only unsaturated agent benchmark in the world Each game is 100% human solvable, AI scores <1% This gap between human and AI performance proves we do not have AGI Agents today [vid…
-
@arcprize
@arcprize
on x
We created an in-house game studio and built 135 novel environments from scratch No instructions, Core Knowledge Priors-only In order to beat these games, AI must: • Explore the environment • Form hypotheses • Execute a plan • Learn and adapt [image]
-
@arcprize
@arcprize
on x
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn [image]
-
@scaling01
@scaling01
on x
ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0% [image]