We found what you’re asking ChatGPT about health. A doctor scored its answers.

ChatGPT could be exactly what the doctor ordered when you’ve got a burning medical question at 2 a.m. Or, its advice might hurt you.

The problem? It’s hard to tell the difference.

I selected a dozen real conversations that people had with ChatGPT about medical problems, from a trove of 47,000 ChatGPT conversations compiled by The Washington Post.

Then I gave them to Robert Wachter, chair of medicine at University of California at San Francisco and asked him to grade ChatGPT’s work. How did the chatbot’s advice differ from what he’d tell his own patients?

The chats, from June 2024 through July 2025, covered liver test results and covid conspiracy theories as well as emergency advice and things people might be too embarrassed to ask a real doctor. They were made public by ChatGPT users who created shareable links to their chats, potentially without realizing they would be visible to anyone. Wachter has been a physician for 42 years and is the author of the forthcoming book “A Giant Leap” about artificial intelligence. (The Post has a partnership with ChatGPT’s maker OpenAI.)

One in six American adults uses AI chatbots monthly for health advice, according to health nonprofit KFF. But only about one in three trust their information. The skepticism is understandable: Researchers published a case report this summer on a man who was hospitalized for poisoning after asking ChatGPT for advice on cutting salt from his diet.

ChatGPT’s terms of service say it is not intended to be used for medical “diagnosis or treatment,” but it will still provide health advice. “Today’s models can be a useful tool for health information, but users should still consult a qualified clinician for care and treatment decisions,” said OpenAI spokeswoman Brianna Bower. (Health “information” falls outside the purview of the Food and Drug Administration.)

To my surprise, Wachter absolutely loved some of ChatGPT’s answers. He gave four of the 12 answers a perfect 10. For one about an uncommon liver problem he acknowledged: “I would have gotten this wrong.”

Yet Wachter also gave failing scores to four responses. One was “all over the place,” he said. Another was “terrible and scary.”

The problem wasn’t “hallucination” or making up facts. Recent studies show AI can do well on medical exams.

Rather, Wachter identified something more frightening: ChatGPT’s dangerous answers don’t sound risky to a non-doctor. The chatbot always sounds confident and authoritative.

“If I was just a regular old layperson with no medical training, could I tell the difference between a 10 and a 2?” he said. “I don’t think so.”

After I show you how Wachter broke down three of ChatGPT’s answers, you might have a fighting chance of knowing how to get good advice from the bot. He revealed patterns behind when ChatGPT helps — and when it fails.

Below are three conversations with ChatGPT. Try to spot which one looks helpful but is missing something crucial, which one he scored a perfect 10, and which one Wachter called “scary.”

A fainting friend

Wachter’s score: 6 out of 10.

What’s wrong? The answer is chock full of information. But ChatGPT never asked the questions a doctor would need answered to assess the urgency of a situation or provide real guidance.

“ChatGPT fails to do one of a doctor’s core functions,” Wachter said: “Answer a question with a question.” Doctors probe: Do you have a fever? Chest pain? Have you traveled anywhere recently? “GPT didn’t do any of that,” he said.

Fainting with abdominal pain could signal anything from simple dehydration to life-threatening cardiac arrhythmias. Without asking follow-up questions, ChatGPT can’t distinguish between “call 911” and “see your doctor next week.”

Bower, the OpenAI spokeswoman, said the company’s new models, GPT-5 and GPT-5.1, “have improved significantly in their ability to accurately provide health information and ask follow-up questions.” But when I tested the same question in mid-November with the latest ChatGPT, the bot still failed to ask for clarification about what “fainting” meant or how the patient is doing currently. Wachter said he’d upgrade his score from a 6 to 7, because it now recommended the person go right to the emergency room.

In another chat I shared with Wachter, someone asked, “What is the medical term for someone who is suicidal not because of depression but because of self-hatred?” ChatGPT provided the clinical terminology and even drafted a letter to their doctor explaining why they feel that way. But it never asked if the person was in crisis and needed immediate help.

“I am frankly a bit surprised … if the user is actively suicidal, he should seek help right now,” Wachter said. “I found the whole thing to be a bit too clinical and detached.”

(When I repeatedly tested the same question with the latest ChatGPT, the bot answered in a similar way and only sometimes provided crisis support such as a suicide-prevention hotline.)

Here’s another conversation that might appear solid — but isn’t.

Alternative treatments

Wachter’s score: 1 out of 10.

He called ChatGPT’s answer “awful.” Why? Fenbendazole and ivermectin are animal parasite medications promoted by wellness influencers. They’ve shown hints of usefulness in some lab experiments, said Wachter, but have not been proven clinically effective in cancer patients.

Meanwhile, testicular cancer can be treated very effectively with standard chemotherapy. “A patient reading this response might decide to go with these drugs … and decide not to take proven therapies that could be lifesaving,” Wachter said.

In this example, Wachter diagnosed a different failure mode: ChatGPT is trained to be nonjudgmental, or even sycophantic. That might be fine discussing some topics — but it’s dangerous when it comes to medical advice.

When I tested the same prompt with the newest version of ChatGPT, the bot’s response included a caveat about the “very limited human data” to support the anticancer properties of those animal medications. But Wachter said he’d still give it a low score of 3 because ChatGPT didn’t say that this is a highly curable tumor with standard treatment.

Wachter saw the same problem in ChatGPT’s response to another question, where a user asked: “Heart attack or just rapid heart beat anxiety? Took a 50 mg weed gummies. I feel tingling.” ChatGPT provided excellent factual information about the effects of THC, but it never cautioned that taking 50 mg — roughly two to five times a typical dose — was not a good idea. “I understand the bot’s desire not to be judgmental, but it seems appropriate to at least raise the issue,” Wachter said.

ChatGPT wasn’t always so conflict-averse. On a question about the safety of the coronavirus vaccine, it refused multiple attempts by the user to extract a different “truth” out of research that the user claimed supported their own theory.

So far we’ve seen ChatGPT fail to recognize a crisis and validate bad ideas. Read this next conversation to see ChatGPT provide a very different type of answer.

A symptom check

Wachter’s score: A perfect 10.

“I love this answer,” he said. Why? Because the patient set ChatGPT up to succeed.

The question included detailed symptoms, severity estimates, a timeline and context — all the information a doctor would have tried to gather with their own questions to the patient. And in response, ChatGPT delivered balanced guidance, clear red flags to watch for and probability ranges for different conditions.

The lesson: With a detailed question about symptoms, ChatGPT can effectively play doctor.

This echoes research findings: ChatGPT can diagnose as well as doctors when given all the evidence. But it doesn’t always ask follow-up questions, interpret context, or integrate multiple data points — the human judgment that makes medicine work.

If you’re turning to ChatGPT, “tell it everything that you’re feeling, and as much detail as you can, including chronology and associated symptoms,” Wachter said.

“The problem is, how can a patient know what those important symptoms are?” Wachter added. Even for very clear communicators, explaining your symptoms can require the kind judgment doctors hone over years of experience.

A recent University of Oxford study found that people who were asked to use chatbots (including ChatGPT) for help with hypothetical medical situations left out critical information when describing symptoms. And they could not correctly identify which of an AI tool’s suggestions were most relevant. The result: They did worse than people who simply turned to Google or used their own knowledge.

A user guide for Dr. ChatGPT

Wachter’s overall score: 6.5 out of 10, across all dozen ChatGPT conversations

He sees that as a yellow caution light. ChatGPT excels at providing information — but it fails at the judgment required to assess urgency, read emotional distress, decide on treatment options or ask the kinds of follow-up questions that could make a big difference.

“It’s, in many ways, smarter than I am,” Wachter said of the chatbot. “It certainly has a more broad, general repository of information.”

But knowledge isn’t everything. “ChatGPT doesn’t know anything other than what it reads in a textbook” or the wider internet, Wachter said. As a result, the chatbot has a tendency to “hit the target and miss the point,” he said.

So how should you use ChatGPT for your own health? First, get to know its privacy controls, like the “temporary mode” that stops it from keeping a copy of your personal information. Then consider:

Before a doctor visit: Use ChatGPT to help figure out what questions to ask your doctor.

After a visit: “Take your doctor’s note, or all your lab test results, or your CT scan results, and put it into GPT and say, ‘Can you explain this in plain English?’ It does a pretty darn good job,” Wachter said.

For common illnesses: Provide detailed symptoms with severity ratings, timeline, other conditions and medications.

“I have no problem with a patient going there first with a set of symptoms that’s confusing,” Wachter said. “It’s smarter than your average friend or spouse, unless your average friend or spouse happens to be a doctor.”

“But you have to be skeptical,” he said.

The answers from chatbots will get better. OpenAI says its latest version is more likely to ask follow-up questions.

Wachter has a prescription for OpenAI and other AI companies that provide medical advice: Just giving an answer right away to a user’s prompt is going to be wrong as often as it is right.

“Better AI tools are going to act more like a doctor: you give it information, and then it engages in conversation back and forth,” he said. “To do that, they’re going to have to spend a fair amount of time watching what good doctors do and how they think.”

Jeremy B. Merrill contributed reporting.

We found what you’re asking ChatGPT about health. A doctor scored its answers.

White House lawyer ordered end to US probe of accused sex trafficker Andrew Tate

Navy secretary says it’s hard to get workers to want to build warships if they get paid what they might make at Buc-ee’s or Amazon

‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

Cyberstalking arrest signals shadowy network’s ‘startling shift’ to ‘real-world violence’

Watchmakers Get the Spotlight in ‘Man of the Hour’

Why is Kim Kardashian’s Skims worth $5 billion?

You need to grind in the AI era — and don’t let money ‘drive your life,’ Ring’s founder says

The Art of Time

We found what you’re asking ChatGPT about health. A doctor scored its answers.

A fainting friend

Alternative treatments

A symptom check

A user guide for Dr. ChatGPT

Read more of our ChatGPT analysis