A gap in understanding AI is growing, as casual users cite flaws in old free models while power users point to new models' staggering gains in technical domains

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is

@karpathy 2026-04-10 Andrej Karpathy

Discussion

@juberti Justin Uberti on x
Not great to called out by an AI OG about AVM, but he's right that the recent capability gains of text models have been >> that of speech models, mostly by thinking harder. But at the same time we need speech models to be faster + more humanlike. The impossible just takes longer.
@lateinteraction Omar Khattab on x
I get this, of course, but I think this dismisses some underlying valid criticism that even laypeople have. And we can't just move the standard every 2 months by saying “well, this model is *so* 2025, so your experience with it can't carry much weight”. The faults with every
@chiefagenteer @chiefagenteer on x
@GaryMarcus I am awed not by the code written but by the mistakes made, the short termism and the lack of capacity to think holistically when it comes to complex coding tasks. Anyone telling different stories is working more on creating content than creating serious software.
@emollick Ethan Mollick on x
AI is jagged, but I think sometimes it is easy to overly focus on that. The generalness is a surprise too! LLMs may be optimized for verifiable fields like coding, but AI is also not bad at corporate strategy & medical advice & writing a sestina & expressing empathy & ideation.
@femisapien_z @femisapien_z on x
@staysaasy Not true. 80% of my use is coding, and I'm not awed at all. In fact it seems more awed by me by far... [image]
@arthurcdent Arthur Dent on x
@binarybits I'm a research professor and can attest to Karpathy's point which I think was broader than you imply: not coding narrowly but a wide range of analytic and technical tasks.
@alanmcole Alan Cole on x
@binarybits 100%. I had Claude Opus code a financial widget, and it made some fundamental conceptual mistakes about finance while also coding perfectly at breakneck speed.
@lordofafew @lordofafew on x
this is why everyone in the goverment should be mandated to use it. they simple do not grasp the urgency
@paranoidchip @paranoidchip on x
@staysaasy And how big your codebase is. And if you have to be on call. I know CEOs that shit out small apps that thinks that AI can do everything. They're also the type to shit out a PR and peace out, not needing to feel the consequences of their slop.
@garrytan Garry Tan on x
You need to use frontier models with giant context and actually have systems that give them the right context at the right time to understand what's happening now in AI. Everyone else is guessing. There is both massive cost (a $20/mo sub is not going to unlock the awesomeness) …
@jenzhuscott Jen Zhu on x
Strongly agree @karpathy - the perception gap is real & widening fast. An even more sharply framing of the usage divergence: //Conversational users// (the majority right now) - treat frontier models as a “super Google” - one-shot prompts for research, writing, brainstorming, or
@paulfreeman99 Paul Freeman on x
@staysaasy OpenClaw, Hermes, and other meta harnesses like that are going to change that perception when they hit mainstream. 600 million Alexa devices are just waiting to be replaced with something that actually does something that saves time and money more meaningfully.
@sameedmed Sameed Khan on x
Love this; yeah I think the vibe of these models is more statistical mech / curve fitter than intelligence but I think we collectively drastically underestimated the usefulness of just scaling the “in-distribution” training data to just cover everything lmao
@karpathy Andrej Karpathy on x
Someone recently suggested to me that the reason OpenClaw moment was so big is because it's the first time a large group of non-technical people (who otherwise only knew AI as synonymous with ChatGPT as a website) experienced the latest agentic models.
@mlstreettalk @mlstreettalk on x
This feels like a complicated way of saying that some experts can leverage automation (AI) technology well because they understand (and can iteratively specify) their domains and those domains are verifiable.
@aakashgupta Aakash Gupta on x
Karpathy just gave you the most concise explanation of why AI feels like two completely different technologies depending on who you ask. The answer is one concept from machine learning: reward signal quality. When you train a model to write code, every attempt gets an automatic
@patarino Adam Patarino on x
@garrytan This is a misconception. Tighter, more deterministic skills go much further than massive context + expensive model
@howdymary Mary on x
the only people who appreciate how advanced LLMs have gotten are - developers using parallel agent swarms - marketers mass producing AI UGC slop - CEOs that want to cut 70% of their workforce
@gagansaluja08 Gagan on x
@staysaasy corollary: the people most dismissive of it are almost always the ones who haven't taken it seriously enough to hit the ceiling. and the ceiling isn't where they think it is. you can't form an accurate opinion from the outside looking in
@danveloper Dan Woods on x
@lateinteraction I think it's more about the ways of working model than anything... when you're using AI to tab-complete some code or treating it like a copilot summarizing your emails, the gains aren't obvious or impressive at all. If you use AI as a collaborator, the advances i…
@garymarcus Gary Marcus on x
We are not getting to the G in Artificial General Intelligence; we are getting to (impressive) advances in particular areas where particular (verifiable) techniques can be used, on problems with advantageous economics. AGI itself is NOT “in striking distance”; inferring that
@zdch Zac Hill on x
@binarybits I sorta agree but sorta don't agree. My experience is that their utility function is highly related to how well I can impute a manageable amount of a) form, b) substance, and c) imperative into their context window. Coding is an instance of this but I think it general…
@larrypanozzo Larry Panozzo on x
@tunguz In my experience too it is like having a grad student employee right alongside you for non-coding tasks. The expertise is high enough. Opus 4.6 (or maybe 4.5) crossed a threshold. (GPT-5.4 no except for some targeted questions.)
@pawelhuryn Paweł Huryn on x
Karpathy nails the gap. But I'd attribute it differently. Most people use LLMs as chatbots. Claude Code, when used right, is an operating system - CLAUDE.md, hooks, subagents, MCPs, skills, and knowledge that compounds. The “awe gap” isn't model intelligence or what you use it
@tszzl Roon on x
@karpathy @soumitrashukla9 non technical people are downloading something called openclaw and using it in their terminal?
@shanumathew93 Shanu Mathew on x
100% - he nails it. I feel like I'm an insane person at times using and talking about the tools nonstop when most of the people I interact with still think it's fancy autocorrect that hallucinates most the time or that its “bad at math”. Most people have yet to truly test out
@davidkyang David Yang on x
@karpathy Another challenge I've witnessed is that the AI tools provided by employers in the workplace (even in companies trying to embody org-wide AI transformation) fall into your first use case because they provide limited or bad proprietary models
@levie Aaron Levie on x
AI adoption is a tale of two cities. On one end (most) users right now are interacting with AI via chat tools and on the other end people are deploying agents to do long running tasks that create and produce real work output or automate workflows. The former is super useful but
@binarybits Timothy B. Lee on x
tldr: models are astonishingly good at coding, kind of bad at a lot of other tasks. I think this should make people at least a little more skeptical about the idea that we're heading toward “AGI.”
@alexberenson Alex Berenson on x
TL/DR: Turns out massive software models trained by breaking human language into data are really good at coding, which is the science of using language to manipulate data... and less good at everything else. Who would have guessed?
@steffenpharai Steffen on x
@karpathy I honestly feel this way daily. I try to communicate how capable AI has become, but I'm still caught with skepticism and just lack of awareness.
@tunguz Bojan Tunguz on x
Exactly right. If you are using AI for anything technical, you are flabbergasted by the advancement in its capabilities. If you are using it for anything else, not so much. Although I've also been increasingly using it for legal/business/professional use cases with great amount
@pmarca Marc Andreessen on x
Well said.
@scobleizer Robert Scoble on x
After building with bleeding edge AI I get this separation that @karpathy lays out deeply. Family and friends have no idea how good the bleeding edge is. Completely uneducated about AI.
@pmarca Marc Andreessen on x
Yep!
@cryptopunk7213 @cryptopunk7213 on x
andrej's spot on. 99% of people don't take AI seriously because they don't use it properly if your job doesn't include programming, research or math chances are you think AI's a fucking toy “silver lining” : the next tier of models (mythos, spud) will cook other professions
@missmi1973 @missmi1973 on x
@karpathy According to OpenAI's own data and a Harvard NBER study, coding queries account for only about 4% of ChatGPT messages, while non-work queries make up over 73%. For non-coding use cases, even $200/month subscribers have experienced stagnation or regression from 2025 thro…
@alex_peys Alex Peysakhovich on x
this was true for a long time although with the latest wave of models im finding them (ok mostly opus 4.6) useful for complex tasks outside coding - my personal benchmarks is does it help me with race car setup stuff, and this was false until basically 1-2 months ago
@staysaasy @staysaasy on x
The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.
@asymmetricinfo Megan McArdle on x
A useful parallel is Elon Musk's success as an engineering entrepreneur and failure as a government reformer. Continuous iterative improvement works really well when you have physical reality as a benchmark, and ... not when you are dealing with poorly specified human systems

Chronicles

A gap in understanding AI is growing, as casual users cite flaws in old free models while power users point to new models' staggering gains in technical domains

Related Coverage

Discussion