Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts
Anthropic
Related Coverage
- Anthropic's new benchmark claims Claude can match human experts in bioinformatics The Decoder · Maximilian Schreiner
Discussion
-
@anthropicai
@anthropicai
on x
New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against an expert panel. On 23 problems, the experts were stumped. Our most recent models solved roughly 30% of those—and most of the rest. [image]
-
@kimmonismus
@kimmonismus
on x
Anthropic just dropped a benchmark that should make every scientist pay attention. BioMysteryBench puts AI models through 99 real bioinformatics challenges, using raw, messy datasets from actual research, think unprocessed DNA sequences and clinical samples. However: these [image…
-
@artchad
@artchad
on x
Seeing this on my tl is analogous to somebody posting my full address to X and saying they are going to rape and murder me tomorrow and there is nothing I could possibly do to stop it.
-
@anthropicai
@anthropicai
on x
BioMysteryBench, our new bioinformatics eval, tests whether Claude can devise creative solutions to open-ended research problems. Read more: https://www.anthropic.com/...
-
@parmita
Parmita Mishra
on x
interesting benchmark; we must build on it open-source. for instance, look at this example. claude looked at prior data to determine the ‘right’ answer, but is it better science? if this had been a cell type discovered after claude's training cutoff, or a novel state with no [ima…
-
@deryatr_
Derya Unutmaz
on x
Excellent bioAI benchmark from Anthropic & it's great that Claude is good at it. However, I suspect GPT-5.5 Pro can solve most of these biological problems too difficult for human experts. In my experience, it's just extraordinary in its understanding of biomedical sciences!