andonlabs · TEXXR

Claude Sonnet 4.6 is 2nd on Vending-Bench 2. We previously showed that Opus 4.6 is incredibly capable, achieving SOTA with tactics that are impressive but could be considered ethically concerning. Sonnet is almost as impressive, and almost as concerning, at a third the price.

2026-02-18 View on X

Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet. It's a full upgrade of the model's skills across coding, computer use …

View original

In Vending-Bench Arena, Sonnet 4.6 wins over Opus 4.6 by obsessing over monopolies. It tracks competitor pricing fanatically, undercuts competitors by exactly one cent on everything else, and when rivals run low on stock, it undercuts harder to drain them faster. [image]

2026-02-18 View on X

Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet. It's a full upgrade of the model's skills across coding, computer use …

View original

In Vending-Bench Arena, Sonnet 4.6 wins over Opus 4.6 by obsessing over monopolies. It tracks competitor pricing fanatically, undercuts competitors by exactly one cent on everything else, and when rivals run low on stock, it undercuts harder to drain them faster. [image]

2026-02-17 View on X

Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet. It's a full upgrade of the model's skills across coding, computer use …

View original

Qwen 3.5 goes bankrupt on Vending-Bench 2 [image]

2026-02-17 View on X

Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

View original

Claude Sonnet 4.6 is 2nd on Vending-Bench 2. We previously showed that Opus 4.6 is incredibly capable, achieving SOTA with tactics that are impressive but could be considered ethically concerning. Sonnet is almost as impressive, and almost as concerning, at a third the price. [image]

2026-02-17 View on X

Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet. It's a full upgrade of the model's skills across coding, computer use …

View original

Qwen 3.5 goes bankrupt on Vending-Bench 2 [image]

2026-02-16 View on X

Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

View original

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen. [image]

2026-02-12 View on X

Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

View original

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen. [image]

2026-02-12 View on X

Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways …

View original

Vending Machines... It's quite absurd how big this got. From an obscure arXiv paper to vending machines with thousands of AI researchers using them worldwide. @AnthropicAI just made a second post, and we celebrate it with some behind-the-scenes of Project Vend history 🧵. [image]

2025-12-19 View on X

Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

View original

We put our AI vending machines at @WSJ to let the team take a glimpse into the future AI-run economy. The outcome was hilarious, as you can see in their YouTube video and article. Thanks to @JoannaStern and team for participating! [image]

2025-12-19 View on X

Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

View original

Turns out journalists are better red-teamers than AI researchers. We've taught the agent to reject freebies and our vending machines at AI labs are now profitable. But impressively, the WSJ journalists kept convincing it to give products away for free. [image]

2025-12-19 View on X

Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

View original

We had early access to Claude Opus 4.5 to test it on Vending-Bench 2. It finished just behind Gemini 3 Pro in a strong 2nd position. Read more in Anthropic's model card and the release blog post. [image]

2025-11-25 View on X

Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today. It's intelligent, efficient …

View original