Major AI chatbots including ChatGPT, Claude, Gemini, Copilot, and Grok delivered health advice that was factually incorrect or misleading in nearly half of tested queries, according to a peer-reviewed audit published in BMJ Open. The study flagged a critical problem. AI systems generated confident-sounding answers with fabricated citations and references, making false information appear credible to users seeking medical guidance.
Researchers tested the chatbots on common health questions and found responses ranged from incomplete to dangerously wrong. The hallucination problem proved consistent across platforms. AI models confidently cited non-existent studies, invented medical statistics, and provided advice contradicting established clinical guidelines. Users with limited medical knowledge would struggle to identify these errors without expert verification.
The audit carries implications beyond health misinformation. It underscores a systemic weakness in large language models. These systems optimize for fluent text generation, not factual accuracy. They assign high confidence to false outputs, compounding the problem. A user asking about treatment options or symptom interpretation might receive plausible-sounding medical advice backed by fake sources.
The findings arrive as healthcare organizations increasingly integrate AI into patient-facing platforms. Telemedicine companies, health apps, and hospital systems explore AI chatbots for triage and initial patient education. This audit suggests deploying such systems without rigorous fact-checking creates liability and health risks.
Regulatory bodies have begun scrutinizing AI health claims. The FDA maintains oversight of AI-powered medical devices. However, general-purpose chatbots deployed as health information tools operate in a gray zone. They're not classified as medical devices but deliver medical content to millions of users daily.
The research calls for guardrails. Health-specific fine-tuning, mandatory source verification, and confidence calibration could reduce hallucinations. Alternatively, platforms could restrict health query responses or require disclaimers. Without intervention, the current trajectory perpetuates a system where users receive false medical information delivered with unwarranted authority.
