DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
South China Morning Post Vincent Chow
Related Coverage
- DeepSeek Introduces mHC Architecture to Improve Large Model Training Blockonomi · Maxwell Mutuma
Discussion
-
@arjunkocher
Arjun
on x
mHC: Manifold-Constrained Hyper-Connections most recent paper from @deepseek_ai [image]
-
@jiqizhixin
@jiqizhixin
on x
On the first day of the New Year, DeepSeek released a major paper. They try to fix the training instability that plagues advanced neural network designs. Enter mHC: Manifold-Constrained Hyper-Connections. They take the powerful but unstable “Hyper-Connections” architecture [image…
-
@nathancgy4
Nathan Chen
on x
last week, a deepseek researcher told me that he believes the two biggest architectural innovations in 2025 are 1) muon and 2) hyper-connections since muon was already heavily explored by kimi, i asked him why don't they do something with hyper-connections now it's [image]
-
@scaling01
@scaling01
on x
of course 2026 starts with a banger DeepSeek paper I quoted a good explanation of what they are doing [image]
-
@zephyr_z9
@zephyr_z9
on x
What are Chinese quant companies smoking to get this kind of performance??? Mogging Sonnet 4.5 with a 40B [image]
-
@dorialexander
Alexander Doria
on x
Unsurprisingly a new DeepSeek banger. I'll post my reading notes later but I already recommend going through the original hyper-connection ByteDance paper first: clearly explain the expected benefits, better layer specialization/management ("enhance the impact of each layer") [im…
-
@zephyr_z9
@zephyr_z9
on x
DeepSeek Moment 2.0 🤔🤔 (the timing matches)
-
@iamgrigorev
George Grigorev
on x
residuals in transformers are great for stability and scaling; deeper layers update the signal along the residual stream. few people questioned this choice publicly, and since 2025 there's been progress. few thoughts about hyper connections (wrt the newly released DeepSeek paper …
-
@zephyr_z9
@zephyr_z9
on x
BRUH These numbers are absolutely insane for a 40B Beating everyone on a bunch of hard benchmarks [image]
-
@scottstts
Scott
on x
DeepSeek dropped a pretty significant paper yesterday mHC building on hyper connection, introduced clever math and algo maneuvers to prevent the unbounded amplification and attenuation issue of normal HC when scaled up (doubly stochastic matrices and Sinkhorn-Knopp projection) [i…
-
@dorialexander
Alexander Doria
on x
So the first major paper of 2026, DeepSeek mHC: Manifold-Constrained Hyper-Connections. This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for
-
@joelc_eth
Joel
on x
DeepSeek drops new paper: mHC (Manifold-Constrained Hyper-Connections). According to a DeepSeek researcher, “the two biggest architectural innovations in 2025 are 1) Muon and 2) Hyper-Connections.” So what does mHC bring to the table? Residual connections have been the backbone […
-
@shiwei_liu66
Shiwei Liu
on x
DeepSeek's new mHC is a nice step toward mitigating the curse of depth issues tied to residual, highlighted in our recent work https://arxiv.org/.... Glad to see frontier labs engaging with this direction. Congrats to Seed-Foundation-Model Team too https://arxiv.org/... [image]
-
@norxornor
Nor
on x
Quick read through of Deepseek's new Manifold-Constrained Hyper-Connections paper: - You want to increase residual size from 1×C to n×C (n streams instead of 1). Earlier residual update: x' = x + layer(x). Make the x be n×C, and use x' = Ax + B layer(Cx) instead. A, B, C are all …
-
@teortaxestex
@teortaxestex
on x
ALERT, NEW YEAR GIFT FROM DEEPSEEK mHC: Manifold-Constrained Hyper-Connections it's a pretty crazy fundamental result! They show stable hyper-connection training. This leth them *scale residual stream width*, with minor compute&memory overhead This is a *huge model smell* recipe.…
-
@novasarc01
Λux
on x
interesting paper by deepseek. the part i liked most is how mHC keeps the multi-stream idea of hyper-connections but puts hard constraints on residual mixing. nice example of real research progress coming from stability analysis rather than chasing more expressivity. [image]
-
@jenzhuscott
Jen Zhu
on x
Excellent thread summarising @deepseek_ai Jan 1st mHC paper. Two quant funds' open-sourced labs in 🇨🇳 already set a dizzy pace for 2026 in its first 24 hours. This builds nicely on the ByteDance Hyper-Connections paper - mHC's manifold constraint feels like a principled [image]
-
@rryssf_
Robert Youssef
on x
🚨 DeepSeek just dropped a paper that quietly exposes why modern neural networks get unstable as they scale. It's called mHC: Manifold-Constrained Hyper-Connections, and the core idea is deceptively simple: Neural networks keep breaking their own geometry. Here's what that [image]
-
@adinayakup
Adina Yakup
on x
To start 2026, @deepseek_ai released mHC🔥 A new architecture that makes hyper-connections more stable when training large models, without losing their performance benefits. https://huggingface.co/...