A look at Andrej Karpathy's “autoresearch” experiment, where an AI agent runs in a recursive self-improvement loop to improve an AI model on one testable metric
Fortune Jeremy Kahn
Related Coverage
- The End of Coding: Andrej Karpathy on Agents, AutoResearch, and the Loopy Era of AI YouTube
- Andrej Karpathy: Humans Are the Bottleneck in AI Research WinBuzzer · Markus Kasanmascheff
Discussion
-
@minchoi
Min Choi
on x
This is insane... Karpathy left an AI running for 2 days to improve itself. It came back with ~20 changes that actually worked. We're at the point where AI is its own best researcher.
-
@mstockton
Matt Stockton
on x
Karpathy is brilliant. This entire thing is amazing, and you dont have to understand all of this post, but you should understand these two points. In fact I think you need to, even if you don't know much about AI: “All LLM frontier labs will do this. It's the final boss battle.
-
@cgtwts
@cgtwts
on x
Andrej Karpathy just ran a pretty insane experiment. > he let an AI research agent run completely unassisted for 48 hours. > no supervision > the agent ran 276 experiments trying different ways to improve a model > 29 of them worked >Combine those together and the model [video]
-
@ryancarson
Ryan Carson
on x
“It's worth thinking about whether your problem falls into this bucket too.” Autoresearch can be applied to any project with these traits: 1. Fast feedback 2. Clear metrics 3. Low cost of experimentation
-
@fakepsyho
@fakepsyho
on x
This is just a simple hill climbing, so it's an extremely crude version of AlphaEvolve / OpenAI's AWTF scaffold / ALE-Agent. But it's also a good reminder that non-bleeding-edge ML is just throwing random shit at the wall and seeing what sticks.
-
@snwy_me
@snwy_me
on x
autoresearch really interested me, despite me not being “all-in” on agents yet. i wanted to get started with running auto experiments i looked to existing tools to serve as a harness but each one had its problems. so i made one introducing Helios for autonomous ML research [image…
-
@kaixhin
Kai Arulkumaran
on x
Andrej has a very good sense of what's currently doable, and I think this needs to be taken at face value - agents can now go off and do serious ML tuning work (incl. fixing bugs and adding improvements from related research), which is amazing, but also far from the full
-
@_weidai
Wei Dai
on x
Is it possible to build “proof-of-useful-work” on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should
-
@mparakhin
Mikhail Parakhin
on x
Codex should really allow GPT-5.4 Pro model to be used, at least as an 'exec'ed subagent. Autoresearch loop is here, but right now the lack of the top models pushes us (ML people) to use Pi + Pro/DeepThink API.
-
@nathancgy4
Nathan Chen
on x
There's likely no better ways to do scientific model architecture research than to climb the scaling ladder, as what we've done for every improvement in Kimi. You rapidly test changes on a small model scale (e.g. 3b total, 800m active) that gives fast enough feedback time, then
-
@iruletheworldmo
@iruletheworldmo
on x
this should be the only thing all of humanity is thinking and speaking about. there are some significantly better models coming soon and things are going to get strange.
-
@gfodor
@gfodor
on x
This is why I went through the stages of grief when the o1 evals showed we had an angle of attack on solving programming ML work has always been like this imo, try a bunch of semi-random stuff based on reasonably justified mathematical intuition, cut losses and let winners ride
-
@zhengyaojiang
Zhengyao Jiang
on x
The headline is automated hill-climbing. I'd say the deeper lesson is eval design. The agent was not optimizing the full, noisy, expensive objective directly. It was climbing a cheap proxy that still tracked reality well enough to transfer: - 5-minute train limit - validation
-
@jachiam0
Joshua Achiam
on x
This is a meaningful indicator that this year will see modest but significant AI research acceleration due to autoresearch. These things will compound over time.
-
@lateinteraction
Omar Khattab
on x
love to see more people feel the vibe of having LLM-driven learning algorithms optimize your systems! :D
-
@kevinweil
Kevin Weil
on x
A look at the future/present
-
@deredleritt3r
Prinz
on x
“All LLM frontier labs will do this. It's the final boss battle... Doing it is ‘just engineering’ and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and
-
@aakashgupta
Aakash Gupta
on x
Karpathy just mass-produced the most expensive part of ML research for free. The bottleneck in neural network development has always been researcher iteration speed. A senior ML engineer costs $400K-$800K/year, runs maybe 3-5 meaningful experiments per day, and spends 80% of
-
@badlogicgames
Mario Zechner
on x
ok, this is awesome. but TIAL that arxiv host LaTex sources as well, which changes everything for me personally :D and now you know too. https://github.com/...
-
@jsonbasedman
@jsonbasedman
on x
My favorite class from my masters degree was “Methods for Search and Optimization” BFGS, simplex, and my personal favorite, CMA-ES Now we have a sort of meta-optimization algorithm: hey agent, here's the function, go run it, look at the results, then make it better. No mistakes
-
@hxiao
Han Xiao
on x
Has anyone adapted @karpathy's autoresearch for multi-objective optimization? Everything I've seen so far optimizes a single target value. would be interesting if classic techniques like multi-armed bandit end up getting reimplemented at the agent level.
-
@sundeep
Sunny Madra
on x
“All LLM frontier labs will do this. It's the final boss battle.”
-
@christinetyip
Christine Yip
on x
We were inspired by @karpathy's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared [image]
-
@0xkydo
@0xkydo
on x
guys, i think this might be the next openclaw. karpathy let an AI agent optimize his own neural net training code for 2 days. it ran 700 experiments autonomously. found 20 improvements he'd missed after months of manual tuning. 11% performance gain. the agent found bugs. tuned
-
@sowmay_jain
Sowmay Jain
on x
most people have zero idea what's about to hit them in the next few months. we're on a parabolic curve so steep that even the people building this stuff can't predict what happens when automated systems are just left running on autopilot.
-
@borismpower
Boris Power
on x
This is a glimpse into the future
-
@birdabo
Sui
on x
karpathy built an AI that's better at his job than he is. > 276 experiments > kept 29 improvements > delivered 11% speedup it found bugs in attention, fixed AdamW settings, tuned things he thought were already perfect. ML researchers getting automated by their own research 💀 [vid…
-
@tengyanai
@tengyanai
on x
The most important sentence in Karpathy's whole post is probably this: anything with a measurable score and fast feedback will become something agents can optimize for you. automatically with no humans involved.
-
@saranormous
Sarah Guo
on x
Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What [video]
-
@cgtwts
@cgtwts
on x
this karpathy interview might be the clearest glimpse of what's coming he basically said we're entering a world where ai writes code, runs autonomous research, and operates through swarms of agents that improve themselves he's already relying on it daily, even catching himself [v…
-
r/artificial
r
on reddit
Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading