/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning rather than memory recall

ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still struggle to do.

Fast Company Mark Sullivan

Discussion

  • @jeremyphoward Jeremy Howard on x
    These games are kinda fun to play! :)
  • @deryatr_ Derya Unutmaz on x
    ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2
  • @fchollet François Chollet on x
    Keep in mind: ARC-AGI is *not* a final exam that you pass to claim AGI. Including ARC-AGI-3. The benchmarks target the residual gap between what's hard for AI and what's easy for humans. It's meant to be a tool to measure AGI progress and to drive researchers towards the most
  • @andykonwinski Andy Konwinski on x
    ARC-AGI-3 benchmark: - 100% solvable by humans - 1% solvable by AI Everybody keep building benchmarks that agents utterly fail at! Proud this was a Laude Slingshot; will fund other benchmarks that reset SotA to 1%: https://www.laude.org/...
  • @gregkamradt Greg Kamradt on x
    The 25 public ARC-AGI-3 games On average, they are easier for humans and AI However, the difficulty ranges, there are very easy games and games which are more difficult Easy for AI: https://arcprize.org/... Hard for AI: https://arcprize.org/...
  • @fchollet François Chollet on x
    ARC-AGI-3 scores agents on how close they are to human action efficiency. All ARC-AGI-3 environments were solved by at least 2 human testers out of 10 (most of the time it was 5+). We use the action count of the 2nd best tester (to avoid outlier performance) as our human
  • @thezvi Zvi Mowshowitz on x
    The real metric is ‘time until he is pleased to announce ARC-AGI-4, the only unsaturated agentic AI benchmark.’
  • @emollick Ethan Mollick on x
    ARC-AGI-3 took me a few tries, but it is definitely human winnable. I am curious how much of the very initially very low performance of frontier models is harness, vision, and tools, versus how much are limitations of LLMs. I guess we will find out! https://arcprize.org/...
  • @iruletheworldmo @iruletheworldmo on x
    if we saturate arc 3 this year and there's no meaningful shift in the economy, it's clear benchmarks have become a gimmick between labs and providers to hill climb, market, make bread. whilst it's exciting to see a benchmark the models perform so poorly at. and i love this
  • @fchollet François Chollet on x
    At the moment, ARC-AGI-3 is the only unsaturated agentic AI benchmark. Sub-1% scores from frontier models on the private test set. If you want to be among the first to know when an AGI breakthrough happens, monitor the ARC-AGI-3 leaderboard. Any sudden score jump will mean
  • @mikeknoop Mike Knoop on x
    ARC-AGI-3 and ARC Prize 2026 are now live with $2,000,000 in prizes! As of today, version 3 is the world's only unsaturated agentic intelligence benchmark. Humans score 100% and frontier AI scores ~0%. Play here: https://arcprize.org/... While no single version of ARC is
  • @fchollet François Chollet on x
    ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first [v…
  • @bryanlanders Bryan Landers on x
    It's alive! This 3rd version of ARC-AGI represents an incredible amount of work from the ARC Prize team. Hundreds of games. Thousands of levels. Go build agents!
  • @gregkamradt Greg Kamradt on x
    Today we're launching ARC-AGI-3 135 Novel Environments (nearly 1K levels) we build by hand It is the only unsaturated agent benchmark in the world Each game is 100% human solvable, AI scores <1% This gap between human and AI performance proves we do not have AGI Agents today [vid…
  • @arcprize @arcprize on x
    We created an in-house game studio and built 135 novel environments from scratch No instructions, Core Knowledge Priors-only In order to beat these games, AI must: • Explore the environment • Form hypotheses • Execute a plan • Learn and adapt [image]
  • @arcprize @arcprize on x
    Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn [image]
  • @scaling01 @scaling01 on x
    ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0% [image]