Gemini co-lead Oriol Vinyals says Gemini 3's gains come from better pre-training and post-training, contradicting the idea that pre-training gains are falling
which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is [image] Andrej Karpathy / @karpathy : I played with Gemini 3 ...
Baidu unveils two AI chips: the M100 for efficient MoE inference, coming in early 2026, and the M300 for training super-large multimodal models, coming in 2027
The M100 and M300 provide ‘powerful, low-cost and controllable AI computing power’ to support the nation's self-reliance push, the firm says
OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM
gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good OpenAI on GitHub : ...
Z.ai, formerly known as Zhipu and that has raised $1.5B from Tencent and others, releases GLM-4.5, an open-source AI model that it says is cheaper than DeepSeek
chinese models really are taking over huh Simon Willison / @simonwillison.net : Pretty decent pelicans from the new GLM-4.5 and GLM-4.5 Air models. Both models are MIT licensed, released by Chinese A...
Alibaba debuts the Qwen3-Coder model for agentic coding, including a 480B-parameter MoE variant, and open sources Qwen Code, a CLI tool adapted from Gemini CLI
Qwen 39.4k — Text Generation Transformers Safetensors qwen3_moe conversational Coco Feng / South China Morning Post : Alibaba upgrades flagship Qwen3 model to outperform OpenAI, DeepSeek in maths, c...
Moonshot's Kimi K2 uses a 1T-parameter MoE architecture with 32B active parameters and outperforms models like GPT-4.1 and DeepSeek-V3 on key benchmarks
Moonshot AI, the Chinese artificial intelligence startup behind the popular Kimi chatbot, released an open-source language model on Friday …
Mark Zuckerberg says Meta will share news about a Llama 4 Reasoning model “in the next month”
David Sacks Jesus Rodriguez / TheSequence : The Sequence Radar #526: Llama 4 Scout and Maverick are Here! Aman Gupta / Livemint : ChatGPT vs Meta AI: Which AI chatbot is better after the Llama 4 launc...
Mac Studio with M3 Ultra and 512GB of unified memory review: opens up new workflows on a ~$10,000 desktop, like running a quantized version of DeepSeek R1 671B
if confusing — performance magic Federico Viticci / MacStories : The M3 Ultra Mac Studio for Local LLMs Brandon Hill / Tom's Hardware : Apple Mac Studio review: M3 Ultra offers amazing performance, si...
Sources: Meta set up four war rooms to analyze High-Flyer's DeepSeek, including two for how High-Flyer cut training costs and one on what data it may have used
“Wait, how much are we spending on research of little/no utility to Meta proper?” — “Wait, what? How much? How many Stanford PhDs did LeCun hire to endlessly fellate his ego?” — “Are you kidding...
A look at the many lawsuits against RealPage, the rent-fixing software company accused of artificially inflating the real estate rental prices across the US
RealPage, the rent-fixing software company currently under FBI investigation, also has apps for bogus fees, monetizing vacant apartments and inflating toxic property bubbles. Mastodon: @carnage4life@m...