Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors

Researchers say results mark a ‘profound change in technology that will reshape medicine’ — From George Clooney in ER …

The Guardian 2026-05-02 Robert Booth

Context & Ripple Effects

The related coverage traces a long shift from AI as a diagnostic aid to hospital use in risk prioritization, targeted care, and point-of-care tools such as AI stethoscopes. This study puts a general-purpose OpenAI model into the more consequential emergency-department triage context.

Its result also sits beside recent coverage finding ChatGPT Health frequently misclassified the urgency of emergencies and nonurgent cases. That contrast makes validation conditions—not just a headline accuracy comparison—central to whether such systems can be trusted in clinical workflows.

First-order effects

OpenAI gains a high-profile piece of evidence for o1 in clinical reasoning from electronic records and nurse notes, while triage teams get another benchmark against which AI-assisted intake can be evaluated.
The reported result will increase scrutiny of how the study defined correct diagnoses, selected cases, and handled urgency, because emergency-room triage requires safe prioritization as well as diagnostic accuracy.

Second-order effects

Hospitals and clinical-AI vendors will face pressure to distinguish narrowly validated diagnostic-support uses from autonomous triage claims, especially given the conflicting severity-assessment result in related coverage.
Health systems already using predictive prioritization tools may seek comparable evaluations of their models and workflows rather than treating general-model performance as directly transferable to patient care.

Third-order effects

If replicated across settings and paired with reliable safety performance, clinical AI is likely to move from specialist point tools toward workflow-level support built around medical records, clinician notes, and escalation decisions.
The durable constraint will be evidence and accountability: inconsistent results across medical-AI evaluations could push the market toward more rigorous, use-specific validation before broad deployment.

The trend: This is one data point in the transition from AI diagnostic augmentation to AI embedded in frontline clinical decision workflows, with validation quality determining adoption.

Chronicles

Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors

Context & Ripple Effects

First-order effects

Second-order effects

Third-order effects

Related Coverage

Discussion