DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
South China Morning Post Vincent Chow
Related Coverage
- mHC: Manifold-Constrained Hyper-Connections arXiv.org
- Get Ready to Talk About DeepSeek Again Bloomberg
- China's DeepSeek kicked off 2026 with a new AI training method that analysts say is a ‘breakthrough’ for scaling Business Insider · Lee Chong Ming
- DeepSeek Unveils ‘mHC’ Architecture to Fix AI Training Instability Amid Chip Bans WinBuzzer · Markus Kasanmascheff
- DeepSeek touts new training method as China pushes AI efficiency The Economic Times
- DeepSeek's New Architecture Paper Signals More Than Technical Progress Implicator.ai · Maria Garcia
- Deepseek says new method can train AI more efficiently and cheaply Computerworld · Viktor Eriksson
- DeepSeek develops mHC AI architecture to boost model performance SiliconANGLE · Maria Deutscher
- DeepSeek Kicks Off the Year With New Paper Introducing the mHC Architecture Pandaily
- DeepSeek Introduces mHC Architecture to Improve Large Model Training Blockonomi · Maxwell Mutuma
- DeepSeek just dropped a new architecture update. — If you've been following the evolution of Foundation Models, you know that Hyper-Connections … Somya Rai
- What if LLM could scale across 3B, 9B, and 27B parameters—without increase in computational burden? — Wait, does it mean we don't need nearly as many data centers as anticipated? … Hào Lǐ
Discussion
-
@saritharai
Saritha Rai
on x
DeepSeek touts new training method in a paper co-authored by reclusive founder Liang Wenfeng, as Chinese AI companies strive to build more efficient AI systems @business https://www.bloomberg.com/...
-
@chrmanning
Christopher Manning
on x
Great to see an AI lab doing and publishing science (as well as discussing engineering efficiencies)! Some of the other “frontier” labs should try it! Thx, @deepseek_ai!
-
@meer_aiit
Meer
on x
DeepSeek just dropped a core transformer architecture change. Manifold-Constrained Hyper-Connections replace the single residual stream with multiple parallel signal paths. Standard residual connections have been the foundation of deep learning since ResNets. They allow [image]
-
@iamgrigorev
George Grigorev
on x
residuals in transformers are great for stability and scaling; deeper layers update the signal along the residual stream. few people questioned this choice publicly, and since 2025 there's been progress. few thoughts about hyper connections (wrt the newly released DeepSeek paper …
-
@arjunkocher
Arjun
on x
mHC: Manifold-Constrained Hyper-Connections most recent paper from @deepseek_ai [image]
-
@jiqizhixin
@jiqizhixin
on x
On the first day of the New Year, DeepSeek released a major paper. They try to fix the training instability that plagues advanced neural network designs. Enter mHC: Manifold-Constrained Hyper-Connections. They take the powerful but unstable “Hyper-Connections” architecture [image…
-
@nathancgy4
Nathan Chen
on x
last week, a deepseek researcher told me that he believes the two biggest architectural innovations in 2025 are 1) muon and 2) hyper-connections since muon was already heavily explored by kimi, i asked him why don't they do something with hyper-connections now it's [image]
-
@scaling01
@scaling01
on x
of course 2026 starts with a banger DeepSeek paper I quoted a good explanation of what they are doing [image]
-
@zephyr_z9
@zephyr_z9
on x
What are Chinese quant companies smoking to get this kind of performance??? Mogging Sonnet 4.5 with a 40B [image]
-
@dorialexander
Alexander Doria
on x
Unsurprisingly a new DeepSeek banger. I'll post my reading notes later but I already recommend going through the original hyper-connection ByteDance paper first: clearly explain the expected benefits, better layer specialization/management ("enhance the impact of each layer") [im…
-
@zephyr_z9
@zephyr_z9
on x
DeepSeek Moment 2.0 🤔🤔 (the timing matches)
-
@zephyr_z9
@zephyr_z9
on x
BRUH These numbers are absolutely insane for a 40B Beating everyone on a bunch of hard benchmarks [image]
-
@scottstts
Scott
on x
DeepSeek dropped a pretty significant paper yesterday mHC building on hyper connection, introduced clever math and algo maneuvers to prevent the unbounded amplification and attenuation issue of normal HC when scaled up (doubly stochastic matrices and Sinkhorn-Knopp projection) [i…
-
@dorialexander
Alexander Doria
on x
So the first major paper of 2026, DeepSeek mHC: Manifold-Constrained Hyper-Connections. This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for
-
@joelc_eth
Joel
on x
DeepSeek drops new paper: mHC (Manifold-Constrained Hyper-Connections). According to a DeepSeek researcher, “the two biggest architectural innovations in 2025 are 1) Muon and 2) Hyper-Connections.” So what does mHC bring to the table? Residual connections have been the backbone […
-
@shiwei_liu66
Shiwei Liu
on x
DeepSeek's new mHC is a nice step toward mitigating the curse of depth issues tied to residual, highlighted in our recent work https://arxiv.org/.... Glad to see frontier labs engaging with this direction. Congrats to Seed-Foundation-Model Team too https://arxiv.org/... [image]
-
@norxornor
Nor
on x
Quick read through of Deepseek's new Manifold-Constrained Hyper-Connections paper: - You want to increase residual size from 1×C to n×C (n streams instead of 1). Earlier residual update: x' = x + layer(x). Make the x be n×C, and use x' = Ax + B layer(Cx) instead. A, B, C are all …
-
@teortaxestex
@teortaxestex
on x
ALERT, NEW YEAR GIFT FROM DEEPSEEK mHC: Manifold-Constrained Hyper-Connections it's a pretty crazy fundamental result! They show stable hyper-connection training. This leth them *scale residual stream width*, with minor compute&memory overhead This is a *huge model smell* recipe.…
-
@novasarc01
Λux
on x
interesting paper by deepseek. the part i liked most is how mHC keeps the multi-stream idea of hyper-connections but puts hard constraints on residual mixing. nice example of real research progress coming from stability analysis rather than chasing more expressivity. [image]
-
@jenzhuscott
Jen Zhu
on x
Excellent thread summarising @deepseek_ai Jan 1st mHC paper. Two quant funds' open-sourced labs in 🇨🇳 already set a dizzy pace for 2026 in its first 24 hours. This builds nicely on the ByteDance Hyper-Connections paper - mHC's manifold constraint feels like a principled [image]
-
@rryssf_
Robert Youssef
on x
🚨 DeepSeek just dropped a paper that quietly exposes why modern neural networks get unstable as they scale. It's called mHC: Manifold-Constrained Hyper-Connections, and the core idea is deceptively simple: Neural networks keep breaking their own geometry. Here's what that means…
-
@adinayakup
Adina Yakup
on x
To start 2026, @deepseek_ai released mHC🔥 A new architecture that makes hyper-connections more stable when training large models, without losing their performance benefits. https://huggingface.co/...