Think twice before asking AI about your symptoms.
Despite acing medical exams, artificial intelligence chatbots struggle when it comes to real people seeking health advice, often giving incorrect diagnoses or failing to flag urgent conditions, a new Oxford University-led study published on Monday in the journal Nature Medicine has found.
The research shows that in everyday scenarios, these high-tech assistants are no better than a simple internet search.
The research examined how well the public could identify medical conditions and decide on appropriate action when using AI chatbots.
Nearly 1,300 participants in the United Kingdom were presented with 10 common health scenarios, including headaches after alcohol consumption, exhaustion among new mothers, and symptoms linked to gallstones.
Participants were randomly assigned to use one of three AI systems—OpenAI’s GPT-4o, Meta’s Llama 3, or Cohere’s Command R+—or to rely on conventional internet searches.
The study found that users of AI chatbots correctly identified their health problem in roughly one-third of cases and chose the correct course of action about 45 percent of the time. These results were not significantly different from those of participants using standard search engines.
Researchers highlighted a disconnect between AI performance in controlled medical tests and real-world use.
Unlike simulated patient interactions, many participants failed to provide key information, misunderstood chatbot responses, or ignored advice altogether.
“Despite the attention surrounding artificial intelligence, it is not yet ready to assume the role of a physician,” said study co-author Rebecca Payne of Oxford University.
She warned that seeking medical guidance from chatbots could lead to incorrect diagnoses or delays in urgent care.