/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts

Anthropic

Discussion

  • @anthropicai @anthropicai on x
    New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against an expert panel. On 23 problems, the experts were stumped. Our most recent models solved roughly 30% of those—and most of the rest. [image]
  • @kimmonismus @kimmonismus on x
    Anthropic just dropped a benchmark that should make every scientist pay attention. BioMysteryBench puts AI models through 99 real bioinformatics challenges, using raw, messy datasets from actual research, think unprocessed DNA sequences and clinical samples. However: these [image…
  • @artchad @artchad on x
    Seeing this on my tl is analogous to somebody posting my full address to X and saying they are going to rape and murder me tomorrow and there is nothing I could possibly do to stop it.
  • @anthropicai @anthropicai on x
    BioMysteryBench, our new bioinformatics eval, tests whether Claude can devise creative solutions to open-ended research problems. Read more: https://www.anthropic.com/...
  • @parmita Parmita Mishra on x
    interesting benchmark; we must build on it open-source. for instance, look at this example. claude looked at prior data to determine the ‘right’ answer, but is it better science? if this had been a cell type discovered after claude's training cutoff, or a novel state with no [ima…
  • @deryatr_ Derya Unutmaz on x
    Excellent bioAI benchmark from Anthropic & it's great that Claude is good at it. However, I suspect GPT-5.5 Pro can solve most of these biological problems too difficult for human experts. In my experience, it's just extraordinary in its understanding of biomedical sciences!