Nvidia debuts Nemotron 3 Super, a 120B-parameter hybrid MoE open-weight model; filing: Nvidia plans to spend $26B over the next five years to build open models

Wired 2026-03-11 Will Knight

Discussion

@kuchaev Oleksii Kuchaiev on x
Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys Details in thread 🧵below: [image]
@igtmn Igor Gitman on x
Nemotron 3 Super is out! It's really good and it will only get better from here. And we release all the details - tech report, training code, training data, model weights. Everything you need to build a model like this yourself!
@ggerganov Georgi Gerganov on x
In collaboration with NVIDIA we announce support for the new NVIDIA Nemotron 3 Super model in llama.cpp NVIDIA Nemotron 3 Super is a 120B open MoE model activating just 12B parameters to deliver maximum compute efficiency and accuracy for complex multi-agent applications.
@mweinbach Max Weinbach on x
Trying out the new Nvidia Nemotron 3 Super model on my Mac Studio! [image]
@samhogan Sam Hogan on x
We've been testing Nemotron 3 Super for the last few weeks. TL;DR: it's easily the best Open Source American model for its size. Super fast. Great for agents and tool-calling use cases. We'll be shipping a series of post-trained Nemtron models in the coming weeks.
@manuelfaysse Manuel Faysse on x
If you ever wondered how LLMs became so good at MMLU, the Nvidia Nemotron 3 Super reports that 11.1% of Pretraining Phase 1 data (20T tokens) is MMLU-style SFT data, so over 2T tokens of synthetic tokens specifically designed to reach the coveted 86% performance. [image]
@nvidiaaidev @nvidiaaidev on x
This latest addition to the Nemotron family isn't just a bigger Nano. ✅ Up to 5x higher throughput and 2x accuracy than the previous version ✅ Latent MoE that calls 4x as many expert specialists for the same inference cost  ✅ Multi-token prediction that dramatically reduces [imag…
@nvidianewsroom @nvidianewsroom on x
NVIDIA Nemotron 3 Super is here to accelerate the era of agentic AI. Optimized for NVIDIA Blackwell, this 120B open model uses a hybrid Mixture-of-Experts (MoE) architecture that delivers 5x higher throughput and 2x higher accuracy. The model combines advanced reasoning with a
@_albertgu Albert Gu on x
as always, exciting to see NVIDIA continue to invest in Mamba hybrids and true open source. very impressive results!
@jiantaoj Jiantao Jiao on x
Nemotron 3 Super arrived! With efficiency in mind (Hybrid SSM Latent MoE, designed for Blackwell), the accuracy is also incredible. The most important aspect is scaling RL, utilizing the highly efficient and scalable Nemo Gym backend for RL environments and Nemo RL for model
@ctnzr Bryan Catanzaro on x
Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://research.nvidia.com/ ... And yes, Ultra is comin…
@artificialanlys @artificialanlys on x
NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across [image]
@cloudflaredev @cloudflaredev on x
Building multi-agent systems? @NVIDIA's Nemotron 3 Super (120B A12B) is now on Workers AI. - Reasoning and tool calling for complex multi-agent workflows - Built for code, finance, cybersecurity, and search agent use cases Learn more: https://developers.cloudflare.com/ ...
@nvidia @nvidia on x
New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI
@natolambert Nathan Lambert on x
This looks like a model that's competitive with GPT OSS 120B or similar Qwen3.5 models on intelligence & speed, while coming with tons of open data + training details. Is a huge contribution for the ecosystem. Congrats Nvidia on the Nemotron 3 Super release!
@dr_alphalyrae Vega Shah on x
Today we launch NVIDIA's Nemotron Super 3, a 120B param open model designed to run agentic AI systems across scientific, enterprise and industrial applications. Partners working with us include Dassault Systèmes, Palantir Technologies, Lila Sciences and Edison Scientific Key [ima…
@nvidiaaidev @nvidiaaidev on x
🦞These innovations come together to create a model that is well suited for long-running autonomous agents. On PinchBench—a benchmark for evaluating LLMs as @OpenClaw coding agents—Nemotron 3 Super scores 85.6% across the full test suite, making it the best open model in its [imag…
@kimmonismus @kimmonismus on x
NVIDIA just dropped Nemotron 3 Super - and the architecture is wild. I was able to check it out early, and I love it (thanks, @nvidia) -120B parameters, but only 12B active. -A hybrid Mamba-Transformer MoE design that squeezes serious intelligence out of minimal compute. What [im…
@nvidiaaidev @nvidiaaidev on x
Introducing NVIDIA Nemotron 3 Super 🎉 Open 120B-parameter (12B active) hybrid Mamba-Transformer MoE model Native 1M-token context Built for compute-efficient, high-accuracy multi-agent applications Plus, fully open weights, datasets and recipes for easy customization and [video]
@benitoz Ben Pouladian on x
Nemotron 3 Super ships exactly what I mapped in December: Mamba hybrid, Latent MoE, multi-token prediction, NVFP4 on Blackwell 120B params, 12B active, 5x throughput Full-stack co-design, silicon to model No paywall👇🏽 https://bepresearch.substack.com/ ...
r/technology r on reddit
Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show
@jack @jack on x
this would be excellent
@miles_brundage Miles Brundage on x
I don't think there's a *super* strong reason to take this more seriously than Meta's earlier commitment to open source which was walked back, but a weak reason to think it's real is that NVIDIA benefits from model commoditization more than Meta did https://x.com/...

Chronicles

Nvidia debuts Nemotron 3 Super, a 120B-parameter hybrid MoE open-weight model; filing: Nvidia plans to spend $26B over the next five years to build open models

Related Coverage

Discussion