Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years
Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …
Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters
Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...
OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM
gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good...
Amazon plans to make OpenAI's new gpt-oss open-weight models available on Bedrock and SageMaker, the first time it has offered OpenAI's models to AWS customers
Takeaways by Bloomberg AI — Hide … Tell us how AI is shaping your news experience. Share your feedback
A study from Cohere, Stanford, MIT, and Ai2 accuses LMArena of helping Meta, OpenAI, Google, and Amazon game its popular crowdsourced AI benchmark Chatbot Arena
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI …
Stanford and University of Washington AI researchers claim they trained AI reasoning model s1, distilled from a Gemini 2.0 model, for under $50 in cloud compute
AI researchers at Stanford and the University of Washington were able to train an AI “reasoning” model for under $50 in cloud compute credits …
OpenAI launches o3-mini, its latest reasoning model that the company says is largely on par with o1 and o1-mini in capabilities, but runs faster and costs less
OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company's o family of reasoning models.
OpenAI launches o3-mini, its latest reasoning model that the company says is largely on par with o1 and o1-mini in capabilities, but runs faster and costs less
OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company's o family of reasoning models.
DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o
Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.
OpenAI details “deliberative alignment”, a new method it used to make o1 and o3 “think” about its safety policy before responding, to improve overall alignment
Maxwell Zeff / TechCrunch :
Google and OpenAI's AI product announcements over the past month have transformed the state of AI and show the breadth and pace of change
The last month has transformed the state of AI, with the pace picking up dramatically in just the last week. AI labs have unleashed a flood of new products …
Meta announces Movie Gen, a suite of AI models for generating realistic video and audio clips; Movie Gen Video has 30B parameters and Movie Gen Audio has 13B
The next frontier in generative AI is video—and with Movie Gen, Meta has now staked its claim.
Meta announces Movie Gen, a suite of AI models for generating realistic video and audio clips; Movie Gen Video has 30B parameters and Movie Gen Audio has 13B
The next frontier in generative AI is video—and with Movie Gen, Meta has now staked its claim.
AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math
A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models
LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly t...
A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models
LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly t...
Apple researchers share OpenELM, a family of LLMs with 270M to 3B parameters, designed to run on-device, and pre-trained and fine-tuned on public datasets
Shubham Sharma / VentureBeat :
Microsoft debuts Phi-3 Mini, a small 3.8B-parameter model about as capable as GPT-3.5, and plans Phi-3 Small and Phi-3 Medium models with 7B and 14B parameters
Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.
Microsoft debuts Phi-3 Mini, a small 3.8B-parameter model about as capable as GPT-3.5, and plans Phi-3 Small and Phi-3 Medium models with 7B and 14B parameters
Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.
Google researchers detail a technique that gives LLMs the ability to work with text of infinite length while keeping memory and compute requirements constant
A new paper by researchers at Google claims to give large language models (LLMs) the ability to work with text of infinite length.