METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more
Andrej Karpathy / karpathy :
Xiaomi releases MiMo-V2-Flash, an open-weight MoE model with 309B total and 15B active parameters, saying it excels in reasoning, coding, and agentic scenarios
improve math, break coding. Enhance reasoning, hurt safety. ✅ Solution: Train specialized expert [image] Elie / @eliebakouch : wow, this looks like a very solid open model by Xiaomi, competing with K2...
Allen Institute for AI launches Bolmo 7B and Bolmo 1B, claiming they are “the first fully open byte-level language models”, built on its Olmo 3 models
and every token gets the same compute, regardless of complexity. Benjamin Minixhofer / @bminixhofer : There are also some things Bolmo lets us do which we just can't do using subword-level LMs. For ex...
Pebble unveils the Pebble Index 01, a $99 smart ring with an on-device LLM for processing voice notes, shipping in March 2026, initially for $75
You can speak into the Pebble Index to have it remember things or set reminders, timers, and tasks. No cloud processing, no subscription, and best of all, no charging.
Hugging Face details how it used its new tool, Skills, to fine tune an LLM using Claude, including for writing scripts, submitting jobs to cloud GPUs, and more
We gave Claude the ability to fine-tune language models using a new tool called Hugging Face Skills. X: @ben_burtenshaw , @donvito , @ben_burtenshaw , @arig23498 , and @ben_burtenshaw . Forums: r/Loca...
Sources: OpenAI is developing a new LLM, codenamed Garlic, that outperforms Gemini 3 and Claude Opus 4.5 in coding and reasoning tasks, per internal evaluations
OpenAI, which in recent weeks has appeared to fall behind Google in AI development, is fighting back with a new large language model codenamed Garlic. X: @amir X: Amir Efrati / @amir : new: OpenAI dev...
Elon Musk says Grok 5 would be released “in Q1 sometime”, later than a deadline he previously set of releasing the model by the end of 2025
Says Musk Is ‘Like Da Vinci’ X: @scaling01 : Huge Leaks on Grok-5 and its predecessors from recent Elon Musk interview: - “Grok-5 is a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3...
Apple refreshes the web version of the App Store, letting users view and search for apps for the iPhone, iPad, Mac, Apple TV, Apple Watch, and Apple Vision Pro
Apple Issues Critical Update For 1 Billion iPhones Jason Cross / Macworld : Apple brings the App Store to the web Appleosophy : Apple launches new App Store for the web Brad Bennett / MobileSyrup : Ap...
DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute
the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model su...