Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, above 53.4% for Opus 4.6
Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model …
Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6
Michael Nuñez /VentureBeat:NEW
OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard
After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …
Anthropic releases Claude 3.5 Haiku at $1 per million input tokens, up 4x from Claude 3.0 Haiku's ¢25 per million tokens, and without image analysis features
Anthropic released Claude 3.5 Haiku today, a few days later than expected … X: @anthropicai : During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks—at a ...
Anthropic claims that its new Sonnet 3.5 model scores 49% on SWE-bench Verified, up from 33.4% and “higher than all” public models, and debuts Claude 3.5 Haiku
Today, we're announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku.
Google unveils a feature for its Google Phone app called Verified Calls that lets businesses display a reason for their call, shown under their logo and name
Google today is introducing a new feature for Android phones that will help legitimate businesses reach their customers by phone …