adinayakup · TEXXR

To start 2026, @deepseek_ai released mHC🔥 A new architecture that makes hyper-connections more stable when training large models, without losing their performance benefits. https://huggingface.co/...

2026-01-02 View on X

South China Morning Post

DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

To start 2026, @deepseek_ai released mHC🔥 A new architecture that makes hyper-connections more stable when training large models, without losing their performance benefits. https://huggingface.co/...

2026-01-01 View on X

South China Morning Post

DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

View original

China just passed the U.S. in open model downloads for the first time 👀 New data from Economies of Open Intelligence led by @huggingface policy team & community collaborators, presents some notable observations: ✨ Developer adoption In 2025, Chinese model developers saw [image]

2025-12-01 View on X

NBC News

US startups are increasingly adopting open-weight Chinese AI models, which are cheaper, more customizable, and sufficiently capable compared to frontier US ones

View original

DeepSeek-OCR is out 🔥 https://huggingface.co/... ✨High-accuracy OCR - MIT license ✨Fast GPU inference (FlashAttention 2, BF16) ✨Docs > Markdown ✨Works with transformers

2025-10-21 View on X

The Decoder

DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute

the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8....

View original

Hunyuan-MT-7B 🔥 open translation model released by @TencentHunyuan https://huggingface.co/... ✨ Supports 33 languages, including 5 ethnic minority languages in China 👀 ✨ Including a translation ensemble model: Chimera-7B ✨ Full pipeline: pretrain > CPT > SFT > enhancement >

2025-09-02 View on X

The Decoder

Tencent open sources translation models Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B, which support 33 languages, claiming they beat established models in benchmarks

Tencent 5.61k — Translation Transformers Safetensors hunyuan_v1_dense text-generation Ben Jiang / South China Morning Post : Tencent's open-source translation model beats Google,...

View original

Xiaomi @Xiaomi just entered the open source as a new player🔥 And dropped MiMo - a 7B model trained from scratch for reasoning 🚀 https://huggingface.co/... ✨ 7B - Base/RL/SFT/RL zero ✨ Surpasses 32B models in math & code ✨ Apache 2.0 licensed

2025-04-30 View on X

Bloomberg

Xiaomi unveils open-source AI reasoning model MiMo, joining other Chinese tech leaders hoping to make a splash in the burgeoning AI field endorsed by Beijing

Xiaomi debuted MiMo a day after Alibaba unveiled the latest version of its own flagship model, amplifying a race between China's tech players …

View original

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by @Alibaba_Qwen 🎉 https://huggingface.co/... ✨ Combines visual understanding & language reasoning. ✨ Scores 70.3 on MMMU ✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving [image]

2024-12-26 View on X

Qwen

Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B

QVQ-72B-Preview is an experimental research model developed by the Qwen team … QwenLM on GitHub : Qwen2-VL — Introduction After a year's relentless efforts, today we are thrilled...

View original