Gemini co-lead Oriol Vinyals says Gemini 3's gains come from better pre-training and post-training, contradicting the idea that pre-training gains are falling
which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is [image] Andrej Karpathy / @karpathy : I played with Gemini 3 ...
Andrej Karpathy unveils nanochat, a full-stack training and inference implementation of an LLM in a single, dependency-minimal codebase, deployable in 4 hours
It provides a full ChatGPT-style LLM, including training, inference and a web Ui … X: Clem / @clementdelangue : Am I wrong in sensing a paradigm shift in AI? Feels like we're moving from a world obses...
Google announces Gemma 3 270M, a compact model designed for task-specific fine-tuning with strong capabilities in instruction following and text structuring
ai.google.dev/gemma/docs/c... Tim Duffy / @timfduffy.com : Google just released a 270M parameter Gemma model. As a tiny model lover I'm excited. Models in this size class are usually barely coherent...
A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models
LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly touched upon, but is ...