Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts

Anthropic 2026-04-30

Discussion

@anthropicai @anthropicai on x
New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against an expert panel. On 23 problems, the experts were stumped. Our most recent models solved roughly 30% of those—and most of the rest. [image]
@kimmonismus @kimmonismus on x
Anthropic just dropped a benchmark that should make every scientist pay attention. BioMysteryBench puts AI models through 99 real bioinformatics challenges, using raw, messy datasets from actual research, think unprocessed DNA sequences and clinical samples. However: these [image…
@artchad @artchad on x
Seeing this on my tl is analogous to somebody posting my full address to X and saying they are going to rape and murder me tomorrow and there is nothing I could possibly do to stop it.
@anthropicai @anthropicai on x
BioMysteryBench, our new bioinformatics eval, tests whether Claude can devise creative solutions to open-ended research problems. Read more: https://www.anthropic.com/...
@parmita Parmita Mishra on x
interesting benchmark; we must build on it open-source. for instance, look at this example. claude looked at prior data to determine the ‘right’ answer, but is it better science? if this had been a cell type discovered after claude's training cutoff, or a novel state with no [ima…
@deryatr_ Derya Unutmaz on x
Excellent bioAI benchmark from Anthropic & it's great that Claude is good at it. However, I suspect GPT-5.5 Pro can solve most of these biological problems too difficult for human experts. In my experience, it's just extraordinary in its understanding of biomedical sciences!

Chronicles

Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts

Related Coverage

Discussion