Apple researchers detail the limitations of top LLMs and large reasoning models, including on classic problems like the Tower of Hanoi, which AI solved in 1957

LLM “reasoning” is so cooked they turned my name into a verb — Quoth Josh Wolfe, well-respected venture capitalist at Lux Capital:

Marcus on AI 2025-06-09 Gary Marcus

Discussion

@carnage4life Dare Obasanjo on bluesky
Apple has published a research paper claiming LLMs outperform large reasoning models on simple tasks, LRMs outperform LLMs at medium complexity tasks and they both totally fail at handling very complex tasks. — The optics of Apple publishing a paper saying LRMs don't work has b…
@nothingistrue.net Jeremy on bluesky
Our brains naturally impose meaning on coherent language—and LLMs leverage that. We mistakenly equate confidence with competence, potentially mistaking polished output for understanding. — Related, Apple's recent research paper: The Illusion of Thinking: — machinelearning.ap…
@lukaszolejnik Lukasz Olejnik on bluesky
According to Apple researchers, the so-called “thinking” LLM/AI models do not think. They have no goals or intentions. They demonstrate impressive capabilities on tasks of moderate complexity, yet their true capacities and scalability are illusory. ml-site.cdn-apple.com/papers/…
@ka81 Caroline Keep on bluesky
LLMs don't learn. They just reproduce what we already know and fail at high complexity. Apple's new paper. Why using them for anything new is a issue. They can't think. — machinelearning.apple.com/research/ ill...
@jasongorman Jason Gorman on bluesky
When I'm programming, and reading code, I'm predicting what the code will do ("semantics"). — This is what's called “dynamic reasoning”. — New research from Apple presents strong evidence that LLMs and LRMs aren't capable of dynamic reasoning. — QED. — machinelearning.app…
@clauswilke.com Claus Wilke on bluesky
New paper by Apple: Reasoning models completely collapse on sufficiently complex tasks. They simply give up. — ml-site.cdn-apple.com/papers/the- i... [image]
@sungkim Sung Kim on bluesky
This is the most talked-about paper in AI social media right now. Don't ask me why - I thought it was already well known. — “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” — machinelearning.apple.…
@dorialexander Alexander Doria on bluesky
Ok I guess I have to go through that Apple paper. — My immediate issue is the framing which is super binary: “Are these models capable of generalizable reasoning, or are they leveraging different forms of pattern matching?” ml-site.cdn-apple.com/papers/the- i... [image]
@abeba Dr Abeba Birhane on bluesky
“frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budge” ml-site.cdn-apple…
@markriedl Mark Riedl on bluesky
It's fascinating to see Microsoft and Apple go in such different directions when it comes to AI. — Microsoft went all-in and wrote “Sparks of AGI” www.microsoft.com/en-us/resear... Apple has done the bare minimum and just wrote “The Illusion of Thinking” ml-site.cdn-apple.com/…
@stroughtonsmith … Steve Troughton-Smith on mastodon
I feel like Apple has shipped more ‘AIs are dumb, actually’ research papers than tentpole Apple Intelligence features over the past year, while the rest of the industry laps them over and over
@andrewwhite01 @andrewwhite01 on x
Apple's AI researchers have embraced a kind of anti-LLM cynic ethos, publishing multiple papers trying to argue that reasoning LLMs are somehow limited/cannot generalize. Apple also has the worst AI products (Siri, Apple Intlligence). No idea what their “strategy” is here [image]
@jhochderffer Jeffrey Dean Hochderffer on x
It's becoming increasingly apparent that @GaryMarcus is correct. New LLM models are producing diminishing returns. Current models are powerful and useful, but they are short of being reliable autonomous agents capable of replacing the vast human workforce. New breakthroughs
@stevesi Steven Sinofsky on x
Many AI doomers are grasping on to the Apple paper on “Illusion of [AI] Thinking”, claiming they have said this all along. That was a relatively minor criticism they made of AI compared to ending all of the human race. The paper does not extrapolate to cover those concerns.
@rubenhssd Ruben Hassid on x
BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests) [image]
@rubenhssd Ruben Hassid on x
What do you think? Is Apple just “coping” because they've been outpaced in AI developments over the past two years? Or is Apple correct? Comment below and I'll respond to all.
@burkov Andriy Burkov on x
Apple did more for AI than anyone else: they proved through peer-reviewed publications that LLMs are just neural networks and, as such, have all the limitations of other neural networks trained in a supervised way, which I and a few other voices tried to convey, but the noise
@timsweeneyepic Tim Sweeney on x
AI is in an awkward state of development. We have these tiny (4K) prompt embedding vectors capturing relations among abstract notions, connected to small (64K+) literal text or image contexts, and a static large (2T+) model. It's enough for some magic but not true intelligence.
@bgurley Bill Gurley on x
With that said, I don't think its a good thing that Apple researchers tilt to the skeptical. How can you innovate if the core thinkers/leaders in your organization are non-believers? Might explain why Apple may be failing to lead in this important area.
@andrewwhite01 @andrewwhite01 on x
@anshulkundaje Sure! It's not hard. The content of these papers is actually pretty good, but the abstract/conclusions/narrative are just in bad faith. Let's take the earlier paper about reasoning models in October 2024: “...it remains unclear whether their mathematical reasoning …
@garymarcus Gary Marcus on x
@andrewwhite01 it's not a corporate strategy; they are reporting what they found. and conspiracy theories won't change that.
@garymarcus Gary Marcus on x
Cope. The fact that LLMs and LRMs can't reliably reason has nothing to do with what is (or isn't) scary. It has to do with how they are implemented. People who can't be bothered to dig into how current models actually work—and what their actual limitations are—aren't HELPING
@ns123abc Nik on x
lmao the apple paper was written by an intern😭 [image]
@chargoddard Charles Goddard on x
🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning! This is paradigm-shifting. A MUST-READ. Full breakdown below 👇 🧵 1/23 [image]
@bgurley Bill Gurley on x
On one hand, this may be breakthrough research & challenges the common Silicon Valley wisdom that these models will scale ad infinitum. If they uncovered a fundamental blocker - its important. Would love to hear the counter analysis from one of the large model providers?
@garymarcus Gary Marcus on x
Healthy and unhealthy strategies for coping with the Apple paper: - attack Apple for publishing it (which does nothing to address the underlying problems they pointed out) or - figure out its implications and develop a robust alternative (the healthier option)
@kylemorgenstein Kyle on x
... who do people think write the papers??
@wolfejosh Josh Wolfe on x
Apple just GaryMarcus'd LLM reasoning ability [image]
@bgurley Bill Gurley on x
Interesting paper out of Apple research on the reasoning limitations of LLM models, especially for complex problems. The paper is short and consumable. 🧵https://ml-site.cdn- apple.com/...
@emollick Ethan Mollick on x
I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse - limitations that were overcome quickly in practice
r/ClaudeAI r on reddit
reasoning models getting absolutely cooked rn
r/programming r on reddit
A knockout blow for LLMs?

Chronicles

Apple researchers detail the limitations of top LLMs and large reasoning models, including on classic problems like the Tower of Hanoi, which AI solved in 1957

Related Coverage

Discussion