A gap in understanding AI is growing, as casual users cite flaws in old free models while power users point to new models' staggering gains in technical domains
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is
@karpathy Andrej Karpathy
Discussion
-
@juberti
Justin Uberti
on x
Not great to called out by an AI OG about AVM, but he's right that the recent capability gains of text models have been >> that of speech models, mostly by thinking harder. But at the same time we need speech models to be faster + more humanlike. The impossible just takes longer.
-
@lateinteraction
Omar Khattab
on x
I get this, of course, but I think this dismisses some underlying valid criticism that even laypeople have. And we can't just move the standard every 2 months by saying “well, this model is *so* 2025, so your experience with it can't carry much weight”. The faults with every
-
@chiefagenteer
@chiefagenteer
on x
@GaryMarcus I am awed not by the code written but by the mistakes made, the short termism and the lack of capacity to think holistically when it comes to complex coding tasks. Anyone telling different stories is working more on creating content than creating serious software.
-
@emollick
Ethan Mollick
on x
AI is jagged, but I think sometimes it is easy to overly focus on that. The generalness is a surprise too! LLMs may be optimized for verifiable fields like coding, but AI is also not bad at corporate strategy & medical advice & writing a sestina & expressing empathy & ideation.
-
@femisapien_z
@femisapien_z
on x
@staysaasy Not true. 80% of my use is coding, and I'm not awed at all. In fact it seems more awed by me by far... [image]
-
@arthurcdent
Arthur Dent
on x
@binarybits I'm a research professor and can attest to Karpathy's point which I think was broader than you imply: not coding narrowly but a wide range of analytic and technical tasks.
-
@alanmcole
Alan Cole
on x
@binarybits 100%. I had Claude Opus code a financial widget, and it made some fundamental conceptual mistakes about finance while also coding perfectly at breakneck speed.
-
@lordofafew
@lordofafew
on x
this is why everyone in the goverment should be mandated to use it. they simple do not grasp the urgency
-
@paranoidchip
@paranoidchip
on x
@staysaasy And how big your codebase is. And if you have to be on call. I know CEOs that shit out small apps that thinks that AI can do everything. They're also the type to shit out a PR and peace out, not needing to feel the consequences of their slop.
-
@garrytan
Garry Tan
on x
You need to use frontier models with giant context and actually have systems that give them the right context at the right time to understand what's happening now in AI. Everyone else is guessing. There is both massive cost (a $20/mo sub is not going to unlock the awesomeness) …
-
@jenzhuscott
Jen Zhu
on x
Strongly agree @karpathy - the perception gap is real & widening fast. An even more sharply framing of the usage divergence: //Conversational users// (the majority right now) - treat frontier models as a “super Google” - one-shot prompts for research, writing, brainstorming, or
-
@paulfreeman99
Paul Freeman
on x
@staysaasy OpenClaw, Hermes, and other meta harnesses like that are going to change that perception when they hit mainstream. 600 million Alexa devices are just waiting to be replaced with something that actually does something that saves time and money more meaningfully.
-
@sameedmed
Sameed Khan
on x
Love this; yeah I think the vibe of these models is more statistical mech / curve fitter than intelligence but I think we collectively drastically underestimated the usefulness of just scaling the “in-distribution” training data to just cover everything lmao
-
@karpathy
Andrej Karpathy
on x
Someone recently suggested to me that the reason OpenClaw moment was so big is because it's the first time a large group of non-technical people (who otherwise only knew AI as synonymous with ChatGPT as a website) experienced the latest agentic models.
-
@mlstreettalk
@mlstreettalk
on x
This feels like a complicated way of saying that some experts can leverage automation (AI) technology well because they understand (and can iteratively specify) their domains and those domains are verifiable.
-
@aakashgupta
Aakash Gupta
on x
Karpathy just gave you the most concise explanation of why AI feels like two completely different technologies depending on who you ask. The answer is one concept from machine learning: reward signal quality. When you train a model to write code, every attempt gets an automatic
-
@patarino
Adam Patarino
on x
@garrytan This is a misconception. Tighter, more deterministic skills go much further than massive context + expensive model
-
@howdymary
Mary
on x
the only people who appreciate how advanced LLMs have gotten are - developers using parallel agent swarms - marketers mass producing AI UGC slop - CEOs that want to cut 70% of their workforce
-
@gagansaluja08
Gagan
on x
@staysaasy corollary: the people most dismissive of it are almost always the ones who haven't taken it seriously enough to hit the ceiling. and the ceiling isn't where they think it is. you can't form an accurate opinion from the outside looking in
-
@danveloper
Dan Woods
on x
@lateinteraction I think it's more about the ways of working model than anything... when you're using AI to tab-complete some code or treating it like a copilot summarizing your emails, the gains aren't obvious or impressive at all. If you use AI as a collaborator, the advances i…
-
@garymarcus
Gary Marcus
on x
We are not getting to the G in Artificial General Intelligence; we are getting to (impressive) advances in particular areas where particular (verifiable) techniques can be used, on problems with advantageous economics. AGI itself is NOT “in striking distance”; inferring that
-
@zdch
Zac Hill
on x
@binarybits I sorta agree but sorta don't agree. My experience is that their utility function is highly related to how well I can impute a manageable amount of a) form, b) substance, and c) imperative into their context window. Coding is an instance of this but I think it general…
-
@larrypanozzo
Larry Panozzo
on x
@tunguz In my experience too it is like having a grad student employee right alongside you for non-coding tasks. The expertise is high enough. Opus 4.6 (or maybe 4.5) crossed a threshold. (GPT-5.4 no except for some targeted questions.)
-
@pawelhuryn
Paweł Huryn
on x
Karpathy nails the gap. But I'd attribute it differently. Most people use LLMs as chatbots. Claude Code, when used right, is an operating system - CLAUDE.md, hooks, subagents, MCPs, skills, and knowledge that compounds. The “awe gap” isn't model intelligence or what you use it
-
@tszzl
Roon
on x
@karpathy @soumitrashukla9 non technical people are downloading something called openclaw and using it in their terminal?
-
@shanumathew93
Shanu Mathew
on x
100% - he nails it. I feel like I'm an insane person at times using and talking about the tools nonstop when most of the people I interact with still think it's fancy autocorrect that hallucinates most the time or that its “bad at math”. Most people have yet to truly test out
-
@davidkyang
David Yang
on x
@karpathy Another challenge I've witnessed is that the AI tools provided by employers in the workplace (even in companies trying to embody org-wide AI transformation) fall into your first use case because they provide limited or bad proprietary models
-
@levie
Aaron Levie
on x
AI adoption is a tale of two cities. On one end (most) users right now are interacting with AI via chat tools and on the other end people are deploying agents to do long running tasks that create and produce real work output or automate workflows. The former is super useful but
-
@binarybits
Timothy B. Lee
on x
tldr: models are astonishingly good at coding, kind of bad at a lot of other tasks. I think this should make people at least a little more skeptical about the idea that we're heading toward “AGI.”
-
@alexberenson
Alex Berenson
on x
TL/DR: Turns out massive software models trained by breaking human language into data are really good at coding, which is the science of using language to manipulate data... and less good at everything else. Who would have guessed?
-
@steffenpharai
Steffen
on x
@karpathy I honestly feel this way daily. I try to communicate how capable AI has become, but I'm still caught with skepticism and just lack of awareness.
-
@tunguz
Bojan Tunguz
on x
Exactly right. If you are using AI for anything technical, you are flabbergasted by the advancement in its capabilities. If you are using it for anything else, not so much. Although I've also been increasingly using it for legal/business/professional use cases with great amount
-
@pmarca
Marc Andreessen
on x
Well said.
-
@scobleizer
Robert Scoble
on x
After building with bleeding edge AI I get this separation that @karpathy lays out deeply. Family and friends have no idea how good the bleeding edge is. Completely uneducated about AI.
-
@pmarca
Marc Andreessen
on x
Yep!
-
@cryptopunk7213
@cryptopunk7213
on x
andrej's spot on. 99% of people don't take AI seriously because they don't use it properly if your job doesn't include programming, research or math chances are you think AI's a fucking toy “silver lining” : the next tier of models (mythos, spud) will cook other professions
-
@missmi1973
@missmi1973
on x
@karpathy According to OpenAI's own data and a Harvard NBER study, coding queries account for only about 4% of ChatGPT messages, while non-work queries make up over 73%. For non-coding use cases, even $200/month subscribers have experienced stagnation or regression from 2025 thro…
-
@alex_peys
Alex Peysakhovich
on x
this was true for a long time although with the latest wave of models im finding them (ok mostly opus 4.6) useful for complex tasks outside coding - my personal benchmarks is does it help me with race car setup stuff, and this was false until basically 1-2 months ago
-
@staysaasy
@staysaasy
on x
The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.