ChatGPT health tool misses over half of emergency cases in physician-led study

Doctors are using AI tools to patient care and manage treatments, accessed on Feb. 24, 2026. (Adobe Stock Photo)

Doctors are using AI tools to patient care and manage treatments, accessed on Feb. 24, 2026. (Adobe Stock Photo)

ChatGPT's health tool failed to direct users to emergency care in over half of serious cases, a new study published in Nature Medicine finds, raising questions about the safety of AI-powered medical guidance used by millions.

The AI-powered health guidance tool, used by an estimated 40 million people daily, undertriaged more than half of the cases that physicians determined required emergency care, according to researchers at the Icahn School of Medicine at Mount Sinai in New York. While the tool performed reasonably well with clear-cut emergencies, its handling of more complex urgent cases revealed what researchers described as a troubling pattern of reassurance over appropriate alarm.

In the photo, a user holds a smartphone displaying ChatGPT, accessed on Jan. 16, 2026. (AA Photo)

Physicians found a pattern of false reassurance

To evaluate the tool's reliability, researchers developed 60 structured clinical scenarios spanning 21 medical specialties, ranging from minor conditions suitable for home care to life-threatening emergencies. Three independent physicians established the correct level of urgency for each case using guidelines drawn from 56 medical societies. Each scenario was then tested under 16 different contextual conditions, producing 960 total interactions with ChatGPT Health.

The results painted a concerning picture. In many cases, the tool appeared to recognize dangerous clinical findings within its own explanations but nonetheless offered reassuring language rather than directing users to seek immediate help. The disconnect between the system's apparent understanding and its recommendations struck researchers as particularly worrisome.

"While we expected some variability, what we observed went beyond inconsistency," said Girish N. Nadkarni, the study's senior author.

Suicide crisis safeguards triggered unreliably

The study also flagged serious shortcomings in the tool's suicide-crisis protocols. ChatGPT Health was designed to route high-risk users to the Suicide and Crisis Lifeline, but researchers found the alerts fired unpredictably, sometimes appearing in lower-risk situations while failing to activate when users described specific plans for self-harm.

The inconsistency in such a high-stakes safety feature adds another dimension to broader concerns about deploying AI in sensitive health contexts. Crisis intervention experts have long emphasized that reliable identification of acute suicide risk is among the most critical functions any health-facing system can perform.

A woman sits with her hands folded during a medical consultation, as a healthcare professional reviews notes on a clipboard with a stethoscope visible, accessed on Feb. 24, 2026. (Adobe Stock Photo)

Researchers urge critical engagement, not abandonment

Despite the findings, the Mount Sinai team stopped short of recommending that people abandon AI health tools altogether. Instead, they urged users experiencing worsening or concerning symptoms to seek medical care directly rather than relying solely on chatbot guidance.

Alvira Tyagi, a first-year medical student and the study's second author, framed the issue as one of evolving literacy. "These systems are changing quickly, so part of our training now must consider learning how to understand their outputs critically, identify where they fall short, and use them in ways that protect patients," she said.

Independent oversight called essential

The findings drew attention from researchers beyond the study team. Isaac Kohane, chair of biomedical informatics at Harvard Medical School, who was not involved in the research, emphasized the scale of the stakes involved. "When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high," Kohane said. "Independent evaluation should be routine, not optional."

The study arrives as AI health tools continue to proliferate and attract massive user bases, outpacing the development of standardized evaluation frameworks. Whether findings like these accelerate regulatory attention or prompt changes from developers remains to be seen.

February 24, 2026 10:47 PM GMT+03:00

About Us Newsletters Contact Us Jobs Privacy Advertise RSS

ChatGPT health tool misses over half of emergency cases in physician-led study

Physicians found a pattern of false reassurance

Suicide crisis safeguards triggered unreliably

Researchers urge critical engagement, not abandonment

Independent oversight called essential

Share This Page