/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Andrej Karpathy unveils nanochat, a full-stack training and inference implementation of an LLM in a single, dependency-minimal codebase, deployable in 4 hours

It provides a full ChatGPT-style LLM, including training, inference and a web Ui … X: Clem / @clementdelangue : Am I wrong in sensing a paradigm shift in AI? Feels like we're moving from a world obsessed with generalist LLM APIs to one where more and more companies are training, optimizing, and running their own models built on open source (especially smaller, specialized ones) Some [image] Andrej Karpathy / @karpathy : GitHub repo: https://github.com/... A lot more detailed and technical walkthrough: https://github.com/... Example conversation with the $100, 4-hour nanochat in the WebUI. It's... entertaining :) Larger models (e.g. a 12-hour depth 26 or a 24-hour depth 30) quickly get more [image] Andrej Karpathy / @karpathy : @ClementDelangue @huggingface Ty! huggingface work/infra/datasets are critical to projects like nanochat - to be accurate the source code of nanochat (e.g. at the $100 tier) is ~8KB of Python and ~30GB of fineweb/smoltalk. Dimitris Papailiopoulos / @dimitrispapail : Here's a scaling law that matters more: every two years the gap to frontier closes by a factor of 2x Simon Willison / @simonw : @karpathy Any chance of copy of the trained weights at the $100/$300/$1000 levels? I'd love to try them out without spending $1400 to train them myself first! Andrej Karpathy / @karpathy : @zenitsu_aprntc Good question, it's basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution. Cody Blakeney / @code_star : Give it a year or two and we will have nanoagent. Then nanoagent speed run. Then nanoASI and so on. Yuchen Jin / @yuchenj_uw : @karpathy Love the Nano series as always! This minimal end-to-end training/inference stack is going to influence a lot of ML learners and researchers. Looks like Eureka Labs' LLM101n course is finally coming! Interesting to see Alec Radford involved. (have him teach some Eureka course [image] Andrej Karpathy / @karpathy : And an example of some of the summary metrics produced by the $100 speedrun in the report card to start. The current code base is a bit over 8000 lines, but I tried to keep them clean and well-commented. Now comes the fun part - of tuning and hillclimbing. [image] Andrej Karpathy / @karpathy : @simonw Good idea, I'm running/tuning these tiers now. I'll bunch up some of the low-hanging fruit here over the next few days and make it available (I spent most of my energy going into this v0.1 on the overall harness). Clem / @clementdelangue : It's easier than ever to train, optimize and run your own models thanks to open-source (versus delegating all learning, control, capabilities to black-box APIs).  Cool to see @karpathy proving it once more by leveraging @huggingface fineweb ([link])! Alexander Doria / @dorialexander : New frontier of training pipeline has just come for small models. And if all goes well, we're pushing the new data frontier in a week. Andrej Karpathy / @karpathy : Basically Llama-like, a bit simpler, some influences from modded-nanoGPT.  Tried to find a solid baseline for this scale: - dense transformer - rotary embeddings (and no positional embeddings) - QK norm - untied weights for embedding and unembedding - norm after token embedding - relu^2 activation in MLP... Marco Mascorro / @mascobot : This is pretty cool and basically includes a “full stack” model training with most of the concepts there, but something you can train on a H100 node for ~$100: - Train a tokenizer (Rust implementation) - Pre-train a Transformer on FineWeb, evaluate CORE score across a number of Bluesky: Garvan Walshe / @garvanwalshe.org : I love this *LLM developer* saying LLM doesn't help him coding! [embedded post] Michael Love / @elkmovie : I continue to think that while “LLMs aren't actually that world-changing” is a risk people have at least started paying attention to, “good-enough LLMs are going to be cheap and ubiquitous and those $1B data centers are going to become a liability rather than a moat” is an equally serious problem [embedded post] Forums: Lobsters : nanochat: The best ChatGPT that $100 can buy

@karpathy