A look at Andrej Karpathy's “autoresearch” experiment, where an AI agent runs in a recursive self-improvement loop to improve an AI model on one testable metric

Fortune 2026-03-23 Jeremy Kahn

Discussion

@minchoi Min Choi on x
This is insane... Karpathy left an AI running for 2 days to improve itself. It came back with ~20 changes that actually worked. We're at the point where AI is its own best researcher.
@mstockton Matt Stockton on x
Karpathy is brilliant. This entire thing is amazing, and you dont have to understand all of this post, but you should understand these two points. In fact I think you need to, even if you don't know much about AI: “All LLM frontier labs will do this. It's the final boss battle.
@cgtwts @cgtwts on x
Andrej Karpathy just ran a pretty insane experiment. > he let an AI research agent run completely unassisted for 48 hours. > no supervision > the agent ran 276 experiments trying different ways to improve a model > 29 of them worked >Combine those together and the model [video]
@ryancarson Ryan Carson on x
“It's worth thinking about whether your problem falls into this bucket too.” Autoresearch can be applied to any project with these traits: 1. Fast feedback 2. Clear metrics 3. Low cost of experimentation
@fakepsyho @fakepsyho on x
This is just a simple hill climbing, so it's an extremely crude version of AlphaEvolve / OpenAI's AWTF scaffold / ALE-Agent. But it's also a good reminder that non-bleeding-edge ML is just throwing random shit at the wall and seeing what sticks.
@snwy_me @snwy_me on x
autoresearch really interested me, despite me not being “all-in” on agents yet. i wanted to get started with running auto experiments i looked to existing tools to serve as a harness but each one had its problems. so i made one introducing Helios for autonomous ML research [image…
@kaixhin Kai Arulkumaran on x
Andrej has a very good sense of what's currently doable, and I think this needs to be taken at face value - agents can now go off and do serious ML tuning work (incl. fixing bugs and adding improvements from related research), which is amazing, but also far from the full
@_weidai Wei Dai on x
Is it possible to build “proof-of-useful-work” on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should
@mparakhin Mikhail Parakhin on x
Codex should really allow GPT-5.4 Pro model to be used, at least as an 'exec'ed subagent. Autoresearch loop is here, but right now the lack of the top models pushes us (ML people) to use Pi + Pro/DeepThink API.
@nathancgy4 Nathan Chen on x
There's likely no better ways to do scientific model architecture research than to climb the scaling ladder, as what we've done for every improvement in Kimi. You rapidly test changes on a small model scale (e.g. 3b total, 800m active) that gives fast enough feedback time, then
@iruletheworldmo @iruletheworldmo on x
this should be the only thing all of humanity is thinking and speaking about. there are some significantly better models coming soon and things are going to get strange.
@gfodor @gfodor on x
This is why I went through the stages of grief when the o1 evals showed we had an angle of attack on solving programming ML work has always been like this imo, try a bunch of semi-random stuff based on reasonably justified mathematical intuition, cut losses and let winners ride
@zhengyaojiang Zhengyao Jiang on x
The headline is automated hill-climbing. I'd say the deeper lesson is eval design. The agent was not optimizing the full, noisy, expensive objective directly. It was climbing a cheap proxy that still tracked reality well enough to transfer: - 5-minute train limit - validation
@jachiam0 Joshua Achiam on x
This is a meaningful indicator that this year will see modest but significant AI research acceleration due to autoresearch. These things will compound over time.
@lateinteraction Omar Khattab on x
love to see more people feel the vibe of having LLM-driven learning algorithms optimize your systems! :D
@kevinweil Kevin Weil on x
A look at the future/present
@deredleritt3r Prinz on x
“All LLM frontier labs will do this. It's the final boss battle... Doing it is ‘just engineering’ and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and
@aakashgupta Aakash Gupta on x
Karpathy just mass-produced the most expensive part of ML research for free. The bottleneck in neural network development has always been researcher iteration speed. A senior ML engineer costs $400K-$800K/year, runs maybe 3-5 meaningful experiments per day, and spends 80% of
@badlogicgames Mario Zechner on x
ok, this is awesome. but TIAL that arxiv host LaTex sources as well, which changes everything for me personally :D and now you know too. https://github.com/...
@jsonbasedman @jsonbasedman on x
My favorite class from my masters degree was “Methods for Search and Optimization” BFGS, simplex, and my personal favorite, CMA-ES Now we have a sort of meta-optimization algorithm: hey agent, here's the function, go run it, look at the results, then make it better. No mistakes
@hxiao Han Xiao on x
Has anyone adapted @karpathy's autoresearch for multi-objective optimization? Everything I've seen so far optimizes a single target value. would be interesting if classic techniques like multi-armed bandit end up getting reimplemented at the agent level.
@sundeep Sunny Madra on x
“All LLM frontier labs will do this. It's the final boss battle.”
@christinetyip Christine Yip on x
We were inspired by @karpathy's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared [image]
@0xkydo @0xkydo on x
guys, i think this might be the next openclaw. karpathy let an AI agent optimize his own neural net training code for 2 days. it ran 700 experiments autonomously. found 20 improvements he'd missed after months of manual tuning. 11% performance gain. the agent found bugs. tuned
@sowmay_jain Sowmay Jain on x
most people have zero idea what's about to hit them in the next few months. we're on a parabolic curve so steep that even the people building this stuff can't predict what happens when automated systems are just left running on autopilot.
@borismpower Boris Power on x
This is a glimpse into the future
@birdabo Sui on x
karpathy built an AI that's better at his job than he is. > 276 experiments > kept 29 improvements > delivered 11% speedup it found bugs in attention, fixed AdamW settings, tuned things he thought were already perfect. ML researchers getting automated by their own research 💀 [vid…
@tengyanai @tengyanai on x
The most important sentence in Karpathy's whole post is probably this: anything with a measurable score and fast feedback will become something agents can optimize for you. automatically with no humans involved.
@saranormous Sarah Guo on x
Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What [video]
@cgtwts @cgtwts on x
this karpathy interview might be the clearest glimpse of what's coming he basically said we're entering a world where ai writes code, runs autonomous research, and operates through swarms of agents that improve themselves he's already relying on it daily, even catching himself [v…
r/artificial r on reddit
Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading

Chronicles

A look at Andrej Karpathy's “autoresearch” experiment, where an AI agent runs in a recursive self-improvement loop to improve an AI model on one testable metric

Related Coverage

Discussion