LLM (Company)

@metr_evals 4 related

METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year

just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...

2025-12-22 View

karpathy 3 related

2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more

Andrej Karpathy / karpathy :

2025-12-20 View

Xiaomi Mimo

Xiaomi releases MiMo-V2-Flash, an open-weight MoE model with 309B total and 15B active parameters, saying it excels in reasoning, coding, and agentic scenarios

improve math, break coding. Enhance reasoning, hurt safety. ✅ Solution: Train specialized expert [image] Elie / @eliebakouch : wow, this looks like a very solid open model by Xiaomi, competing with K2...

2025-12-17 View

VentureBeat 4 related

Allen Institute for AI launches Bolmo 7B and Bolmo 1B, claiming they are “the first fully open byte-level language models”, built on its Olmo 3 models

and every token gets the same compute, regardless of complexity. Benjamin Minixhofer / @bminixhofer : There are also some things Bolmo lets us do which we just can't do using subword-level LMs. For ex...

2025-12-16 View

Wired 14 related

Pebble unveils the Pebble Index 01, a $99 smart ring with an on-device LLM for processing voice notes, shipping in March 2026, initially for $75

You can speak into the Pebble Index to have it remember things or set reminders, timers, and tasks. No cloud processing, no subscription, and best of all, no charging.

2025-12-09 View

Hugging Face

Hugging Face details how it used its new tool, Skills, to fine tune an LLM using Claude, including for writing scripts, submitting jobs to cloud GPUs, and more

We gave Claude the ability to fine-tune language models using a new tool called Hugging Face Skills. X: @ben_burtenshaw , @donvito , @ben_burtenshaw , @arig23498 , and @ben_burtenshaw . Forums: r/Loca...

2025-12-05 View

The Information

Sources: OpenAI is developing a new LLM, codenamed Garlic, that outperforms Gemini 3 and Claude Opus 4.5 in coding and reasoning tasks, per internal evaluations

OpenAI, which in recent weeks has appeared to fall behind Google in AI development, is fighting back with a new large language model codenamed Garlic. X: @amir X: Amir Efrati / @amir : new: OpenAI dev...

2025-12-02 View

The Information 3 related

Elon Musk says Grok 5 would be released “in Q1 sometime”, later than a deadline he previously set of releasing the model by the end of 2025

Says Musk Is ‘Like Da Vinci’ X: @scaling01 : Huge Leaks on Grok-5 and its predecessors from recent Elon Musk interview: - “Grok-5 is a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3...

2025-11-15 View

9to5Mac 34 related

Apple refreshes the web version of the App Store, letting users view and search for apps for the iPhone, iPad, Mac, Apple TV, Apple Watch, and Apple Vision Pro

Apple Issues Critical Update For 1 Billion iPhones Jason Cross / Macworld : Apple brings the App Store to the web Appleosophy : Apple launches new App Store for the web Brad Bennett / MobileSyrup : Ap...

2025-11-04 View

The Decoder 12 related

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model su...

2025-10-21 View

LLM

Patterns

Related Entities

Top Voices

Explore Further

Coverage Timeline

METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year

2025 LLM Year in Review: shift toward RLVR, Claude Code emerged as the first convincing example of an LLM agent, Nano Banana was paradigm shifting, and more

Xiaomi releases MiMo-V2-Flash, an open-weight MoE model with 309B total and 15B active parameters, saying it excels in reasoning, coding, and agentic scenarios

Allen Institute for AI launches Bolmo 7B and Bolmo 1B, claiming they are “the first fully open byte-level language models”, built on its Olmo 3 models

Pebble unveils the Pebble Index 01, a $99 smart ring with an on-device LLM for processing voice notes, shipping in March 2026, initially for $75

Hugging Face details how it used its new tool, Skills, to fine tune an LLM using Claude, including for writing scripts, submitting jobs to cloud GPUs, and more

Sources: OpenAI is developing a new LLM, codenamed Garlic, that outperforms Gemini 3 and Claude Opus 4.5 in coding and reasoning tasks, per internal evaluations

Elon Musk says Grok 5 would be released “in Q1 sometime”, later than a deadline he previously set of releasing the model by the end of 2025

Apple refreshes the web version of the App Store, letting users view and search for apps for the iPhone, iPad, Mac, Apple TV, Apple Watch, and Apple Vision Pro

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

Quarterly Coverage

Top Sources

Narrative

Key Moments

Relationships