shiwei_liu66 · TEXXR

2026-01-02

DeepSeek's new mHC is a nice step toward mitigating the curse of depth issues tied to residual, highlighted in our recent work https://arxiv.org/.... Glad to see frontier labs engaging with this direction. Congrats to Seed-Foundation-Model Team too https://arxiv.org/... [image]

2026-01-02 View on X

South China Morning Post

DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

2026-01-01

2026-01-01 View on X

South China Morning Post

DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original