Verified (Person)

VentureBeat 28 related

Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, above 53.4% for Opus 4.6

Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model …

2026-04-08 View

VentureBeat 5 related

Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6

Michael Nuñez /VentureBeat:NEW

2026-04-07 View

VentureBeat 97 related

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

2025-08-08 View

TechCrunch 5 related

Anthropic releases Claude 3.5 Haiku at $1 per million input tokens, up 4x from Claude 3.0 Haiku's ¢25 per million tokens, and without image analysis features

Anthropic released Claude 3.5 Haiku today, a few days later than expected … X: @anthropicai : During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks—at a ...

2024-11-05 View

Anthropic 4 related

Anthropic claims that its new Sonnet 3.5 model scores 49% on SWE-bench Verified, up from 33.4% and “higher than all” public models, and debuts Claude 3.5 Haiku

Today, we're announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku.

2024-10-22 View

TechCrunch 41 related

Google unveils a feature for its Google Phone app called Verified Calls that lets businesses display a reason for their call, shown under their logo and name

Google today is introducing a new feature for Android phones that will help legitimate businesses reach their customers by phone …

2020-09-09 View

Verified

Patterns

Related Entities

Top Voices

Explore Further

Coverage Timeline

Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, above 53.4% for Opus 4.6

Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

Anthropic releases Claude 3.5 Haiku at $1 per million input tokens, up 4x from Claude 3.0 Haiku's ¢25 per million tokens, and without image analysis features

Anthropic claims that its new Sonnet 3.5 model scores 49% on SWE-bench Verified, up from 33.4% and “higher than all” public models, and debuts Claude 3.5 Haiku

Google unveils a feature for its Google Phone app called Verified Calls that lets businesses display a reason for their call, shown under their logo and name

Quarterly Coverage

Top Sources

Narrative

Key Moments

Relationships