The ARC Prize Foundation unveils ARC-AGI-3, an AI benchmark with simple video-game-like scenarios designed to measure on-the-fly reasoning, not memory recall
The influential AI researcher François Chollet has long argued that the field measures intelligence incorrectly …
GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers
And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks of testing — The Ve...
A look at the ARC-AGI exam designed by French computer scientist François Chollet to show the gulf between AI models' memorized answers and “fluid intelligence”
Matteo Wong / The Atlantic :
The Arc Prize Foundation says its new ARC-AGI-2 test stumps most AI models; humans get 60% of the questions right but GPT-4.5 and Claude 3.7 Sonnet score ~1%
[image] François Chollet / @fchollet : Unlike ARC-AGI-1, this new version is not easily brute-forced. Current top AI approaches score 0-4%. All base LLMs (GPT-4.5, Claude 3.7 Sonnet, Gemini 2, etc.)...
OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025
12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI model to tackle compl...
OpenAI says safety researchers can sign up for o3 preview today and that it decided not to name the new model o2 “out of respect” for the UK telecom company
here's why these ‘reasoning’ models are a giant leap Jaspreet Singh / Reuters : OpenAI unveils ‘o3’ reasoning AI models in test phase Bluesky: Glenn White / @justicar.xyz : “Out of respect” for not ge...
o3, trained on the ARC-AGI-1 Public Training set, scored 87.5% on ARC Prize's Semi-Private Evaluation in a high-compute configuration; GPT-4o scored 5% in 2024
This is “The AI Economy,” a weekly LinkedIn-first newsletter … Sharon Goldman / Fortune : Sam Altman says OpenAI's new o3 ‘reasoning’ models begin the ‘next phase’ of AI. Is this AGI? Pradeep Viswanat...