2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more
Andrej Karpathy / karpathy :
karpathy Andrej Karpathy
Related Coverage
- How AI truly advanced in 2025: Andrej Karpathy highlights 3 key points Digit
- Reflections on AI at the end of 2025 antirez.com
- LLM Year in Review Hacker News
Discussion
-
@ethanpierce_ai
Ethan Pierce
on x
This hits because thinking about why models behave that way starts to matter once they're running in systems, not just passing benchmarks.
-
@imprashantrai1
Prashant Rai
on x
Code becoming free, disposable, and ephemeral is a bigger shift than most people realize.
-
@clkbsfth
Fatih Celikbas
on x
This is such a good recap of the decade of 2025
-
@andyxandersen
@andyxandersen
on x
“In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage” “Supervision bits-wise, human neural nets are optimized for survival ... but LLM neural nets are optimized for imitating humanity's text”
-
@_thomasip
Thomas Ip
on x
JUST IN: Karpathy no longer trust LLM benchmarks. It's not long ago that LMArena is his go to benchmark to guage how good a model is in real world settings. Now he simplies disregard benchmarks altogether due to the prevalence of benchmaxxing. [image]
-
@chrishartdev
Chris Hart
on x
“simultaneously a lot smarter than expected and a lot dumber than expected” Sounds about right
-
@_eddie_dempsey
@_eddie_dempsey
on x
Nice article. I can't emphasize the “vibe coding” section enough. The ability to create your own software tools through text and/or speech is so incredibly liberating and empowering. The (digital) world is your oyster!
-
@kbhaskar_95
Karthik Bhaskar
on x
Best review of the LLM landscape 2025
-
@noodle_saint_7
William Stoner
on x
- Claude have made some big strides in attracting enterprise users with their high quality agents. - OAI have had some public sentiment shift but shouldn't under estimate how sticky the ChatGPT interface is. Plus Jonny Ive's design still to come (!) - Wonder how Gemini and Xai
-
@datadelaurier
@datadelaurier
on x
Absolutely incredible read and we also get another phrase from Karpathy Jagged Intelligence [image]
-
@rezendi
Jon Evans
on x
“The core issue is that benchmarks are almost by construction verifiable environments and are therefore immediately susceptible to RLVR and weaker forms of it via synthetic data generation.”
-
@jherritz
Josh Herritz
on x
“I suspect that LLM labs will trend to graduate the generally capable college student, but LLM apps will organize, finetune and actually animate teams of them into deployed professionals in specific verticals by supplying private data, sensors and actuators and feedback loops.”
-
@manosaie
@manosaie
on x
Benchmarks are a rigged game We've all known it. But have we played out where this nets out? Eventually, we'll ignore them completely For now, we're in an uncanny valley. We think they're strictly better than no approximation of a model's capability. But that's a shaky
-
@yuchenj_uw
Yuchen Jin
on x
Andrej's 2025 LLM Review is the PewDiePie YouTube Rewind of AI. His naming superpower is unmatched. 2 tweets. 2 most influential AI terms of the year: - “Vibe coding” - “Jagged Intelligence” [image]
-
@kylebrussell
Kyle Russell
on x
if anything gets to be an automatic Must Read...
-
@karpathy
Andrej Karpathy
on x
2025 LLM Year in Review
-
r/mlscaling
r
on reddit
“2025 LLM Year in Review”, Andrej Karpathy