2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more

Andrej Karpathy / karpathy :

karpathy 2025-12-20 Andrej Karpathy

Discussion

@ethanpierce_ai Ethan Pierce on x
This hits because thinking about why models behave that way starts to matter once they're running in systems, not just passing benchmarks.
@imprashantrai1 Prashant Rai on x
Code becoming free, disposable, and ephemeral is a bigger shift than most people realize.
@clkbsfth Fatih Celikbas on x
This is such a good recap of the decade of 2025
@andyxandersen @andyxandersen on x
“In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage” “Supervision bits-wise, human neural nets are optimized for survival ... but LLM neural nets are optimized for imitating humanity's text”
@_thomasip Thomas Ip on x
JUST IN: Karpathy no longer trust LLM benchmarks. It's not long ago that LMArena is his go to benchmark to guage how good a model is in real world settings. Now he simplies disregard benchmarks altogether due to the prevalence of benchmaxxing. [image]
@chrishartdev Chris Hart on x
“simultaneously a lot smarter than expected and a lot dumber than expected” Sounds about right
@_eddie_dempsey @_eddie_dempsey on x
Nice article. I can't emphasize the “vibe coding” section enough. The ability to create your own software tools through text and/or speech is so incredibly liberating and empowering. The (digital) world is your oyster!
@kbhaskar_95 Karthik Bhaskar on x
Best review of the LLM landscape 2025
@noodle_saint_7 William Stoner on x
- Claude have made some big strides in attracting enterprise users with their high quality agents. - OAI have had some public sentiment shift but shouldn't under estimate how sticky the ChatGPT interface is. Plus Jonny Ive's design still to come (!) - Wonder how Gemini and Xai
@datadelaurier @datadelaurier on x
Absolutely incredible read and we also get another phrase from Karpathy Jagged Intelligence [image]
@rezendi Jon Evans on x
“The core issue is that benchmarks are almost by construction verifiable environments and are therefore immediately susceptible to RLVR and weaker forms of it via synthetic data generation.”
@jherritz Josh Herritz on x
“I suspect that LLM labs will trend to graduate the generally capable college student, but LLM apps will organize, finetune and actually animate teams of them into deployed professionals in specific verticals by supplying private data, sensors and actuators and feedback loops.”
@manosaie @manosaie on x
Benchmarks are a rigged game We've all known it. But have we played out where this nets out? Eventually, we'll ignore them completely For now, we're in an uncanny valley. We think they're strictly better than no approximation of a model's capability. But that's a shaky
@yuchenj_uw Yuchen Jin on x
Andrej's 2025 LLM Review is the PewDiePie YouTube Rewind of AI. His naming superpower is unmatched. 2 tweets. 2 most influential AI terms of the year: - “Vibe coding” - “Jagged Intelligence” [image]
@kylebrussell Kyle Russell on x
if anything gets to be an automatic Must Read...
@karpathy Andrej Karpathy on x
2025 LLM Year in Review
r/mlscaling r on reddit
“2025 LLM Year in Review”, Andrej Karpathy

Chronicles

2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more

Related Coverage

Discussion