arankomatsuzaki

RLPT: Reinforcement Learning on Pre-Training Data • RL directly on pre-train data (no human labels) • Next-segment reasoning objective (ASR + MSR tasks) → self-supervised rewards • Gains on Qwen3-4B: +3.0 MMLU, +8.1 GPQA-Diamond, +6.6 AIME24, +5.3 AIME25 [image]

2025-09-24 View on X

Bloomberg

Alibaba's Hong Kong-listed shares hit a nearly four-year high after CEO Eddie Wu announced plans to increase AI spending beyond the $53B target over three years

Alibaba Group Holding Ltd.'s shares surged to their highest in nearly four years after revealing plans to ramp up AI spending past …

View original

RLPT: Reinforcement Learning on Pre-Training Data • RL directly on pre-train data (no human labels) • Next-segment reasoning objective (ASR + MSR tasks) → self-supervised rewards • Gains on Qwen3-4B: +3.0 MMLU, +8.1 GPQA-Diamond, +6.6 AIME24, +5.3 AIME25 [image]

2025-09-24 View on X

Simon Willison's Weblog

Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters

Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba...

View original

gpt-oss-120b performs comparably to Qwen 3 (Thinking / Coder) on major tasks while using ~5x less active params and lower precision! OpenAI / America is still ahead in the race. It's your turn, Google, Anthropic, DeepSeek and Qwen. [image]

2025-08-06 View on X

Wired

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM

gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good...

View original

gpt-oss-120b performs comparably to Qwen 3 (Thinking / Coder) on major tasks while using ~5x less active params and lower precision! OpenAI / America is still ahead in the race. It's your turn, Google, Anthropic, DeepSeek and Qwen. [image]

2025-08-06 View on X

Bloomberg

Amazon plans to make OpenAI's new gpt-oss open-weight models available on Bedrock and SageMaker, the first time it has offered OpenAI's models to AWS customers

Takeaways by Bloomberg AI — Hide … Tell us how AI is shaping your news experience. Share your feedback

View original

The Leaderboard Illusion - Identifies systematic issues that have resulted in a distorted playing field of Chatbot Arena - Identifies 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release [image]

2025-05-01 View on X

TechCrunch

A study from Cohere, Stanford, MIT, and Ai2 accuses LMArena of helping Meta, OpenAI, Google, and Amazon game its popular crowdsourced AI benchmark Chatbot Arena

A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI …

View original

Stanford presents: s1: Simple test-time scaling - Seeks the simplest approach to achieve test-time scaling and strong reasoning performance - Exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24) - Model, data, and code are open-source [image]

2025-02-06 View on X

TechCrunch

Stanford and University of Washington AI researchers claim they trained AI reasoning model s1, distilled from a Gemini 2.0 model, for under $50 in cloud compute

AI researchers at Stanford and the University of Washington were able to train an AI “reasoning” model for under $50 in cloud compute credits …

View original

Not sure making o3-mini a standalone release was the best marketing call—there's little reason to pick it over R-1. It's prob pricier as an API, lacks visible CoT, doesn't outperform, isn't open-source, and only seems faster.

2025-02-01 View on X

TechCrunch

OpenAI launches o3-mini, its latest reasoning model that the company says is largely on par with o1 and o1-mini in capabilities, but runs faster and costs less

OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company's o family of reasoning models.

View original

The leap from o1 to o3 is exponential, completely bypassing o2. If this pattern holds, o3 won't lead to o4—it'll jump straight to o9. [image]

2025-02-01 View on X

TechCrunch

OpenAI launches o3-mini, its latest reasoning model that the company says is largely on par with o1 and o1-mini in capabilities, but runs faster and costs less

OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company's o family of reasoning models.

View original

Deepseek-V3-Base was just opensourced! - 685B MoE w/ 256 experts topk=8 with sigmoid routing - Outperforms Sonnet 3.5 on Aider benchmark https://huggingface.co/... [image]

2024-12-27 View on X

VentureBeat

DeepSeek releases DeepSeek-V3, an open-source MoE model of 671B total parameters, with 37B activated per token, claiming it outperforms top models like GPT-4o

Chinese AI startup DeepSeek, known for challenging leading AI vendors with its innovative open-source technologies, today released a new ultra-large model: DeepSeek-V3.

View original

OpenAI presents Deliberative Alignment: Reasoning Enables Safer Language Models Saturates many of their hardest safety evaluations and achieves a Pareto improvement on both under- and overrefusals https://openai.com/... [image]

2024-12-23 View on X

TechCrunch

OpenAI details “deliberative alignment”, a new method it used to make o1 and o3 “think” about its safety policy before responding, to improve overall alignment

Maxwell Zeff / TechCrunch :

View original

OpenAI presents Deliberative Alignment: Reasoning Enables Safer Language Models Saturates many of their hardest safety evaluations and achieves a Pareto improvement on both under- and overrefusals https://openai.com/... [image]

2024-12-23 View on X

One Useful Thing

Google and OpenAI's AI product announcements over the past month have transformed the state of AI and show the breadth and pace of change

The last month has transformed the state of AI, with the pace picking up dramatically in just the last week. AI labs have unleashed a flood of new products …

View original

🎬 Meta introduces Movie Gen: A Cast of SotA Media Foundation Models - 30B params, 16-second video at 1080p 16fps with synchronized audio - Text-to-video, video personalization, and video editing - Supports personalized video creation using user images - Innovations in [video]

2024-10-05 View on X

Wired

Meta announces Movie Gen, a suite of AI models for generating realistic video and audio clips; Movie Gen Video has 30B parameters and Movie Gen Audio has 13B

The next frontier in generative AI is video—and with Movie Gen, Meta has now staked its claim.

View original

🎬 Meta introduces Movie Gen: A Cast of SotA Media Foundation Models - 30B params, 16-second video at 1080p 16fps with synchronized audio - Text-to-video, video personalization, and video editing - Supports personalized video creation using user images - Innovations in [video]

2024-10-04 View on X

Wired

Meta announces Movie Gen, a suite of AI models for generating realistic video and audio clips; Movie Gen Video has 30B parameters and Movie Gen Audio has 13B

The next frontier in generative AI is video—and with Movie Gen, Meta has now staked its claim.

View original

ScaleAI just released LLM leaderboards by extending GSM1K to various domains! This can be a great complement to lmsys eval.

2024-05-30 View on X

SiliconANGLE

AI training data provider Scale AI releases SEAL Leaderboards, which uses private datasets to rank LLMs in domains like coding, instruction following, and math

View original

Btw this multiple token training is not a panacea. The performance gain depends on the target task. It leads to no perf gain or slight degradation on some multiple choice questions. It leads to minior improvement on summarization and little to no improvement on arithmetic tasks.

2024-05-07 View on X

VentureBeat

A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models

LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly t...

View original

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference https://arxiv.org/... [image]

2024-05-07 View on X

VentureBeat

A study by Meta researchers suggests that training LLMs to predict multiple tokens at once, instead of just the next token, results in better and faster models

LLM approach to predict multiple tokens KAN: Kolmogorov-Arnold Networks —"promising alternatives to Multi-Layer Perceptrons" [image] Ethan / @ethan_smith_20 : it was only briefly t...

View original

Apple presents OpenELM - An efficient LM family with open-source training and inference framework - Performs on par with OLMo while requiring 2x fewer pre-training tokens repo: https://github.com/... hf: https://huggingface.co/... abs: https://arxiv.org/... [image]

2024-04-25 View on X

VentureBeat

Apple researchers share OpenELM, a family of LLMs with 270M to 3B parameters, designed to run on-device, and pre-trained and fine-tuned on public datasets

Shubham Sharma / VentureBeat :

View original

Microsoft just released Phi-3 - phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5 - phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench https://arxiv.org/... [image]

2024-04-23 View on X

The Verge

Microsoft debuts Phi-3 Mini, a small 3.8B-parameter model about as capable as GPT-3.5, and plans Phi-3 Small and Phi-3 Medium models with 7B and 14B parameters

Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.

View original

A few caveats about Phi-3: - The figure I attached at the beginning had some errors. Here's the updated one. - Phi-3-medium performs well on TriviaQA but noticeably underperforms rel. to GPT-3.5. We can guess that Phi-3 recipe doesn't magically make it understand more random... [image]

2024-04-23 View on X

The Verge

Microsoft debuts Phi-3 Mini, a small 3.8B-parameter model about as capable as GPT-3.5, and plans Phi-3 Small and Phi-3 Medium models with 7B and 14B parameters

Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.

View original

Google presents Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 1B model that was fine-tuned on up to 5K sequence length passkey instances solves the 1M length problem https://arxiv.org/... [image]

2024-04-13 View on X

VentureBeat

Google researchers detail a technique that gives LLMs the ability to work with text of infinite length while keeping memory and compute requirements constant

A new paper by researchers at Google claims to give large language models (LLMs) the ability to work with text of infinite length.

View original