Apple researchers detail the limitations of top LLMs and large reasoning models, including on classic problems like the Tower of Hanoi, which AI solved in 1957
LLM “reasoning” is so cooked they turned my name into a verb — Quoth Josh Wolfe, well-respected venture capitalist at Lux Capital:
Marcus on AI Gary Marcus
Related Coverage
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Apple Machine Learning Research
- Apple's LLM study draws an important distinction about reasoning models 9to5Mac · Marcus Mendes
- Canada's Bengio brothers offer new AI critiques, solutions BetaKit · Douglas Soltys
- AI models still far from AGI-level reasoning: Apple researchers Cointelegraph · Martin Young
- Apple Research Questions AI Reasoning Models Just Days Before WWDC MacRumors · Tim Hardwick
- Approaching WWDC, Apple researchers dispute claims that AI is capable of reasoning 9to5Mac · Ben Lovejoy
- Explained: Apple's new study questions the ‘reasoning’ in large reasoning models TechCircle · Shraddha Goled
- Apple Researchers Publish Paper on the Limits of Reasoning Models (Showing That They're Not Really ‘Reasoning’ at All) Daring Fireball · John Gruber
- Reasoning About the Reasoning of Reasoning Models — Twas the night before WWDC and all through … Spyglass · M.G. Siegler
- Apple research claims popular AI models fail at hard reasoning: Why does it matter? Digit · Jayesh Shinde
- Enterprise hits and misses - AI agents need definition, Apple issues fresh data on LLM reasoning flaws, and a tech CEO hammers return to office mandates diginomica · Jon Reed
- Apple Study Calls Out “Reasoning” AI as Overhyped While Siri Struggles 9meters · Caitlyn Pauley
- New Research: The Harder the Problem, the Dumber the Model implicator.ai · Marcus Schuler
- 233. “The Illusion of Thinking” — Thoughts on This Important Paper Hardcore Software · Steven Sinofsky
- The illusion of “The Illusion of Thinking” Sean Goedecke
- That's exactly my intuition, too (although the writer is far more qualified to have an opinion on this than me.) — “AI is not hitting a wall. But LLMs probably are... we need new approaches...” — Lots of other good insights in this piece. — https://garymarcus.substack.com/ ... @j12t@j12t.social · Johannes Ernst
- “What the Apple paper shows, most fundamentally, regardless of how you define AGI, is that LLMs are no substitute for good well-specified conventional algorithms. (They also can't play chess as well as conventional algorithms, can't fold proteins like special-purpose neurosymbolic hybrids, can't run databases as well as conventional databases, etc.) … @remixtures@tldr.nettime.org · Miguel Afonso Caetano
- “A knockout blow for LLMs? - by Gary Marcus - Marcus on AI” — https://garymarcus.substack.com/ ... > But anybody who thinks LLMs are a direct route to the sort AGI that could fundamentally transform society for the good is kidding themselves @baldur@toot.cafe · Baldur Bjarnason
- A paper found that older LLMS fail to solve problems when they run out of space in memory. Gary Marcus wonders if this is a ‘knockout blow for LLMs’ 🤔 @crumbler · Casey Newton
- A lot of talk about the Apple paper today. — The main conclusion is that reasoning models don't reason. — Are we surprised? — I, for one, am not. … Mike Kentz
- Apple's latest paper on “reasoning” in LLMs is pretty devastating. I explain why (and consider a possible objection) in a weekend longread that explains why nobody should really be surprised: Gary Marcus
- A Knockout Blow for LLMs? Hacker News
- Apple Research Questions AI Reasoning Models Just Days Before WWDC MacRumors Forums
Discussion
-
@carnage4life
Dare Obasanjo
on bluesky
Apple has published a research paper claiming LLMs outperform large reasoning models on simple tasks, LRMs outperform LLMs at medium complexity tasks and they both totally fail at handling very complex tasks. — The optics of Apple publishing a paper saying LRMs don't work has b…
-
@nothingistrue.net
Jeremy
on bluesky
Our brains naturally impose meaning on coherent language—and LLMs leverage that. We mistakenly equate confidence with competence, potentially mistaking polished output for understanding. — Related, Apple's recent research paper: The Illusion of Thinking: — machinelearning.ap…
-
@lukaszolejnik
Lukasz Olejnik
on bluesky
According to Apple researchers, the so-called “thinking” LLM/AI models do not think. They have no goals or intentions. They demonstrate impressive capabilities on tasks of moderate complexity, yet their true capacities and scalability are illusory. ml-site.cdn-apple.com/papers/…
-
@ka81
Caroline Keep
on bluesky
LLMs don't learn. They just reproduce what we already know and fail at high complexity. Apple's new paper. Why using them for anything new is a issue. They can't think. — machinelearning.apple.com/research/ ill...
-
@jasongorman
Jason Gorman
on bluesky
When I'm programming, and reading code, I'm predicting what the code will do ("semantics"). — This is what's called “dynamic reasoning”. — New research from Apple presents strong evidence that LLMs and LRMs aren't capable of dynamic reasoning. — QED. — machinelearning.app…
-
@clauswilke.com
Claus Wilke
on bluesky
New paper by Apple: Reasoning models completely collapse on sufficiently complex tasks. They simply give up. — ml-site.cdn-apple.com/papers/the- i... [image]
-
@sungkim
Sung Kim
on bluesky
This is the most talked-about paper in AI social media right now. Don't ask me why - I thought it was already well known. — “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” — machinelearning.apple.…
-
@dorialexander
Alexander Doria
on bluesky
Ok I guess I have to go through that Apple paper. — My immediate issue is the framing which is super binary: “Are these models capable of generalizable reasoning, or are they leveraging different forms of pattern matching?” ml-site.cdn-apple.com/papers/the- i... [image]
-
@abeba
Dr Abeba Birhane
on bluesky
“frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budge” ml-site.cdn-apple…
-
@markriedl
Mark Riedl
on bluesky
It's fascinating to see Microsoft and Apple go in such different directions when it comes to AI. — Microsoft went all-in and wrote “Sparks of AGI” www.microsoft.com/en-us/resear... Apple has done the bare minimum and just wrote “The Illusion of Thinking” ml-site.cdn-apple.com/…
-
@stroughtonsmith …
Steve Troughton-Smith
on mastodon
I feel like Apple has shipped more ‘AIs are dumb, actually’ research papers than tentpole Apple Intelligence features over the past year, while the rest of the industry laps them over and over
-
@andrewwhite01
@andrewwhite01
on x
Apple's AI researchers have embraced a kind of anti-LLM cynic ethos, publishing multiple papers trying to argue that reasoning LLMs are somehow limited/cannot generalize. Apple also has the worst AI products (Siri, Apple Intlligence). No idea what their “strategy” is here [image]
-
@jhochderffer
Jeffrey Dean Hochderffer
on x
It's becoming increasingly apparent that @GaryMarcus is correct. New LLM models are producing diminishing returns. Current models are powerful and useful, but they are short of being reliable autonomous agents capable of replacing the vast human workforce. New breakthroughs
-
@stevesi
Steven Sinofsky
on x
Many AI doomers are grasping on to the Apple paper on “Illusion of [AI] Thinking”, claiming they have said this all along. That was a relatively minor criticism they made of AI compared to ending all of the human race. The paper does not extrapolate to cover those concerns.
-
@rubenhssd
Ruben Hassid
on x
BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests) [image]
-
@rubenhssd
Ruben Hassid
on x
What do you think? Is Apple just “coping” because they've been outpaced in AI developments over the past two years? Or is Apple correct? Comment below and I'll respond to all.
-
@burkov
Andriy Burkov
on x
Apple did more for AI than anyone else: they proved through peer-reviewed publications that LLMs are just neural networks and, as such, have all the limitations of other neural networks trained in a supervised way, which I and a few other voices tried to convey, but the noise
-
@timsweeneyepic
Tim Sweeney
on x
AI is in an awkward state of development. We have these tiny (4K) prompt embedding vectors capturing relations among abstract notions, connected to small (64K+) literal text or image contexts, and a static large (2T+) model. It's enough for some magic but not true intelligence.
-
@bgurley
Bill Gurley
on x
With that said, I don't think its a good thing that Apple researchers tilt to the skeptical. How can you innovate if the core thinkers/leaders in your organization are non-believers? Might explain why Apple may be failing to lead in this important area.
-
@andrewwhite01
@andrewwhite01
on x
@anshulkundaje Sure! It's not hard. The content of these papers is actually pretty good, but the abstract/conclusions/narrative are just in bad faith. Let's take the earlier paper about reasoning models in October 2024: “...it remains unclear whether their mathematical reasoning …
-
@garymarcus
Gary Marcus
on x
@andrewwhite01 it's not a corporate strategy; they are reporting what they found. and conspiracy theories won't change that.
-
@garymarcus
Gary Marcus
on x
Cope. The fact that LLMs and LRMs can't reliably reason has nothing to do with what is (or isn't) scary. It has to do with how they are implemented. People who can't be bothered to dig into how current models actually work—and what their actual limitations are—aren't HELPING
-
@ns123abc
Nik
on x
lmao the apple paper was written by an intern😭 [image]
-
@chargoddard
Charles Goddard
on x
🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning! This is paradigm-shifting. A MUST-READ. Full breakdown below 👇 🧵 1/23 [image]
-
@bgurley
Bill Gurley
on x
On one hand, this may be breakthrough research & challenges the common Silicon Valley wisdom that these models will scale ad infinitum. If they uncovered a fundamental blocker - its important. Would love to hear the counter analysis from one of the large model providers?
-
@garymarcus
Gary Marcus
on x
Healthy and unhealthy strategies for coping with the Apple paper: - attack Apple for publishing it (which does nothing to address the underlying problems they pointed out) or - figure out its implications and develop a robust alternative (the healthier option)
-
@kylemorgenstein
Kyle
on x
... who do people think write the papers??
-
@wolfejosh
Josh Wolfe
on x
Apple just GaryMarcus'd LLM reasoning ability [image]
-
@bgurley
Bill Gurley
on x
Interesting paper out of Apple research on the reasoning limitations of LLM models, especially for complex problems. The paper is short and consumable. 🧵https://ml-site.cdn- apple.com/...
-
@emollick
Ethan Mollick
on x
I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse - limitations that were overcome quickly in practice
-
r/ClaudeAI
r
on reddit
reasoning models getting absolutely cooked rn
-
r/programming
r
on reddit
A knockout blow for LLMs?