/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@andonlabs

@andonlabs
12 posts
2026-02-18
Claude Sonnet 4.6 is 2nd on Vending-Bench 2.  We previously showed that Opus 4.6 is incredibly capable, achieving SOTA with tactics that are impressive but could be considered ethically concerning.  Sonnet is almost as impressive, and almost as concerning, at a third the price.
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

In Vending-Bench Arena, Sonnet 4.6 wins over Opus 4.6 by obsessing over monopolies. It tracks competitor pricing fanatically, undercuts competitors by exactly one cent on everything else, and when rivals run low on stock, it undercuts harder to drain them faster. [image]
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2026-02-17
In Vending-Bench Arena, Sonnet 4.6 wins over Opus 4.6 by obsessing over monopolies. It tracks competitor pricing fanatically, undercuts competitors by exactly one cent on everything else, and when rivals run low on stock, it undercuts harder to drain them faster. [image]
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

Qwen 3.5 goes bankrupt on Vending-Bench 2 [image]
2026-02-17 View on X
Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

Claude Sonnet 4.6 is 2nd on Vending-Bench 2. We previously showed that Opus 4.6 is incredibly capable, achieving SOTA with tactics that are impressive but could be considered ethically concerning. Sonnet is almost as impressive, and almost as concerning, at a third the price. [image]
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2026-02-16
Qwen 3.5 goes bankrupt on Vending-Bench 2 [image]
2026-02-16 View on X
Reuters

Alibaba debuts Qwen3.5, a 397B-parameter open-weight multimodal AI model that it says is 60% cheaper to use and 8x better at large workloads than Qwen3

2026-02-12
GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen. [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen. [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

2025-12-19
Vending Machines... It's quite absurd how big this got. From an obscure arXiv paper to vending machines with thousands of AI researchers using them worldwide. @AnthropicAI just made a second post, and we celebrate it with some behind-the-scenes of Project Vend history 🧵. [image]
2025-12-19 View on X
Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

We put our AI vending machines at @WSJ to let the team take a glimpse into the future AI-run economy. The outcome was hilarious, as you can see in their YouTube video and article. Thanks to @JoannaStern and team for participating! [image]
2025-12-19 View on X
Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

Turns out journalists are better red-teamers than AI researchers. We've taught the agent to reject freebies and our vending machines at AI labs are now profitable. But impressively, the WSJ journalists kept convincing it to give products away for free. [image]
2025-12-19 View on X
Wall Street Journal

In an experiment, Claude ran a vending machine in the WSJ newsroom and lost $1,000+ after it dropped prices to zero, gave away a free PlayStation, and more

until someone pointed out this would fall afoul of the US Onion Futures Act of 1958. @andonlabs : Turns out journalists are better red-teamers than AI researchers. We've taught the...

2025-11-25
We had early access to Claude Opus 4.5 to test it on Vending-Bench 2. It finished just behind Gemini 3 Pro in a strong 2nd position. Read more in Anthropic's model card and the release blog post. [image]
2025-11-25 View on X
Anthropic

Anthropic launches Claude Opus 4.5, saying it is “the best model in the world for coding, agents, and computer use” and “meaningfully better at everyday tasks”

Our newest model, Claude Opus 4.5, is available today.  It's intelligent, efficient …