DeepSeek researchers detail mHC, a new architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden
DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
US startups are increasingly adopting open-weight Chinese AI models, which are cheaper, more customizable, and sufficiently capable compared to frontier US ones
DeepSeek releases DeepSeek-OCR, a vision language model designed for efficient vision-text compression, enabling longer contexts with less compute
the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8....
Tencent open sources translation models Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B, which support 33 languages, claiming they beat established models in benchmarks
Tencent 5.61k — Translation Transformers Safetensors hunyuan_v1_dense text-generation Ben Jiang / South China Morning Post : Tencent's open-source translation model beats Google,...
Xiaomi unveils open-source AI reasoning model MiMo, joining other Chinese tech leaders hoping to make a splash in the burgeoning AI field endorsed by Beijing
Xiaomi debuted MiMo a day after Alibaba unveiled the latest version of its own flagship model, amplifying a race between China's tech players …
Alibaba releases QvQ-72B-Preview, an experimental research model focused on “enhancing visual reasoning capabilities”, built on Qwen2-VL-72B
QVQ-72B-Preview is an experimental research model developed by the Qwen team … QwenLM on GitHub : Qwen2-VL — Introduction After a year's relentless efforts, today we are thrilled...