MoE

The Decoder 5 related

Nvidia launches Nemotron 3 Ultra, a 550B-parameter MoE open model; Artificial Analysis says it is the smartest open US model, but trails Chinese model Kimi K2.6

It has roughly 550 billion total parameters, with about 55 billion active at any given time.

2026-06-02 View

The Information 1 related

Gemini co-lead Oriol Vinyals says Gemini 3's gains come from better pre-training and post-training, contradicting the idea that pre-training gains are falling

which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is [image] Andrej Karpathy / @karpathy : I played with Gemini 3 ...

2025-11-19 View

South China Morning Post 6 related

Baidu unveils two AI chips: the M100 for efficient MoE inference, coming in early 2026, and the M300 for training super-large multimodal models, coming in 2027

The M100 and M300 provide ‘powerful, low-cost and controllable AI computing power’ to support the nation's self-reliance push, the firm says

2025-11-13 View

Wired 51 related

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM

gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good OpenAI on GitHub : ...

2025-08-06 View

CNBC 16 related

Z.ai, formerly known as Zhipu and that has raised $1.5B from Tencent and others, releases GLM-4.5, an open-source AI model that it says is cheaper than DeepSeek

chinese models really are taking over huh Simon Willison / @simonwillison.net : Pretty decent pelicans from the new GLM-4.5 and GLM-4.5 Air models. Both models are MIT licensed, released by Chinese A...

2025-07-29 View

Qwen 14 related

Alibaba debuts the Qwen3-Coder model for agentic coding, including a 480B-parameter MoE variant, and open sources Qwen Code, a CLI tool adapted from Gemini CLI

Qwen 39.4k — Text Generation Transformers Safetensors qwen3_moe conversational Coco Feng / South China Morning Post : Alibaba upgrades flagship Qwen3 model to outperform OpenAI, DeepSeek in maths, c...

2025-07-23 View

VentureBeat 7 related

Moonshot's Kimi K2 uses a 1T-parameter MoE architecture with 32B active parameters and outperforms models like GPT-4.1 and DeepSeek-V3 on key benchmarks

Moonshot AI, the Chinese artificial intelligence startup behind the popular Kimi chatbot, released an open-source language model on Friday …

2025-07-13 View

Engadget 13 related

Mark Zuckerberg says Meta will share news about a Llama 4 Reasoning model “in the next month”

David Sacks Jesus Rodriguez / TheSequence : The Sequence Radar #526: Llama 4 Scout and Maverick are Here! Aman Gupta / Livemint : ChatGPT vs Meta AI: Which AI chatbot is better after the Llama 4 launc...

2025-04-06 View

Dave2D on YouTube 13 related

Mac Studio with M3 Ultra and 512GB of unified memory review: opens up new workflows on a ~$10,000 desktop, like running a quantized version of DeepSeek R1 671B

if confusing — performance magic Federico Viticci / MacStories : The M3 Ultra Mac Studio for Local LLMs Brandon Hill / Tom's Hardware : Apple Mac Studio review: M3 Ultra offers amazing performance, si...

2025-03-12 View

The Verge 35 related

Google launches Gemini 1.5 for developers and enterprise users, with a context window of up to 1M tokens, and says Gemini 1.5 Pro is on par with Gemini Ultra

but you won't be able to enjoy it just yet Ryan Morrison / Tom's Guide : I test AI for a living and Google's Gemini Pro 1.5 is a true turning point Shaurya Tomer / Hindustan Times : Google DeepMind an...

2024-02-16 View

Patterns

Related Entities

Top Voices

Explore Further

Coverage Timeline

Nvidia launches Nemotron 3 Ultra, a 550B-parameter MoE open model; Artificial Analysis says it is the smartest open US model, but trails Chinese model Kimi K2.6

Gemini co-lead Oriol Vinyals says Gemini 3's gains come from better pre-training and post-training, contradicting the idea that pre-training gains are falling

Baidu unveils two AI chips: the M100 for efficient MoE inference, coming in early 2026, and the M300 for training super-large multimodal models, coming in 2027

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM

Z.ai, formerly known as Zhipu and that has raised $1.5B from Tencent and others, releases GLM-4.5, an open-source AI model that it says is cheaper than DeepSeek

Alibaba debuts the Qwen3-Coder model for agentic coding, including a 480B-parameter MoE variant, and open sources Qwen Code, a CLI tool adapted from Gemini CLI

Moonshot's Kimi K2 uses a 1T-parameter MoE architecture with 32B active parameters and outperforms models like GPT-4.1 and DeepSeek-V3 on key benchmarks

Mark Zuckerberg says Meta will share news about a Llama 4 Reasoning model “in the next month”

Mac Studio with M3 Ultra and 512GB of unified memory review: opens up new workflows on a ~$10,000 desktop, like running a quantized version of DeepSeek R1 671B

Google launches Gemini 1.5 for developers and enterprise users, with a context window of up to 1M tokens, and says Gemini 1.5 Pro is on par with Gemini Ultra

Quarterly Coverage

Top Sources

Narrative

Key Moments

Relationships