DNYUZ
No Result
View All Result
DNYUZ
No Result
View All Result
DNYUZ
Home News

Health Advice From A.I. Chatbots Is Frequently Wrong, Study Shows

February 9, 2026
in News
Health Advice From A.I. Chatbots Is Frequently Wrong, Study Shows

A new study published Monday provided a sobering look at whether A.I. chatbots, which have fast become a major source of health information, are, in fact, good at providing medical advice to the general public.

The experiment found that the chatbots were no better than Google — already a flawed source of health information — at guiding users toward the correct diagnoses or helping them determine what they should do next. And the technology posed unique risks, sometimes presenting false information or dramatically changing its advice depending on slight changes in the wording of the questions.

None of the models evaluated in experiment were “ready for deployment in direct patient care,” the researchers concluded in the paper, which is the first randomized study of its kind.

In the three years since A.I. chatbots were made publicly available, health questions have become one of the most common topics users ask them about.

Some doctors regularly see patients who have consulted an A.I. model for a first opinion. Surveys have found that about one in six adults used chatbots to find health information at least once a month. Major A.I. companies, including Amazon and OpenAI, have rolled out products specifically aimed at answering users’ health questions.

These tools have stirred up excitement for good reasons: The models have passed medical licensing exams and have outperformed doctors on challenging diagnostic problems.

But Adam Mahdi, a professor at the Oxford Internet Institute and senior author of the new Nature Medicine study, suspected that these clean, straightforward medical questions were not a good proxy for how well they worked for real patients.

“Medicine is not like that,” he said. “Medicine is messy, is incomplete, it’s stochastic.”

So he and his colleagues set up an experiment. More than 1,200 British participants, most of whom had no medical training, were given a detailed medical scenario, complete with symptoms, general lifestyle details and medical history. The researchers told the participants to chat with the bot to figure out the appropriate next steps, like whether to call an ambulance or self-treat at home. They tested commercially available chatbots like OpenAI’s ChatGPT and Meta’s Llama.

The researchers found that participants chose the “right” course of action — predetermined by a panel of doctors — less than half of the time. And users identified the correct conditions, like gallstones or subarachnoid hemorrhage, about 34 percent of the time.

They were no better than the control group, who were told to perform the same task using any research method they would normally use at home, mainly Googling.

The experiment is not a perfect window into how chatbots answer medical questions in the real world: Users in the experiment asked about made-up scenarios, which may be different from how they would interact with the chatbots about their own health, said Dr. Ethan Goh, who leads the A.I. Research and Science Evaluation Network at Stanford University.

And since A.I. companies frequently roll out new versions of the models, the chatbots that participants used a year ago during the experiment are likely different from the models users interact with today. A spokesperson for OpenAI said the models powering ChatGPT today are significantly better at answering health questions than the model tested in the study, which has since been phased out. They cited internal data which showed that many new models were far less likely to make common types of mistakes, including hallucinations and errors in potentially urgent situations. Meta did not respond to a request for comment.

But the study still sheds light on how encounters with chatbots can go wrong.

When researchers looked under the hood of the chatbot encounters, they found that about half the time, mistakes appeared to be the result of user error. Participants didn’t enter enough information or the most relevant symptoms, and the chatbots were left to give advice with an incomplete picture of the problem.

One model suggested to a user the “severe stomach pains” that lasted an hour might have been caused by indigestion. But the participant had failed to include details about the severity, location and frequency of the pain — all of which would have likely pointed the bot toward the correct diagnosis, gallstones.

By contrast, when researchers entered the full medical scenario directly into the chatbots, they correctly diagnosed the problem 94 percent of the time.

A major part of what doctors learn in medical school is how to recognize which details are relevant, and which to toss aside.

“There’s a lot of cognitive magic and experience that goes into figuring out what elements of the case are important that you feed into the bot,” said Dr. Robert Wachter, chair of the department of medicine at the University of California, San Francisco, who studies A.I. in health care.

But Dr. Andrew Bean, a graduate student at Oxford and lead author of the paper, said that the burden should not necessarily fall on users to craft the perfect question. He said chatbots should ask follow up questions, similarly to the way doctors gather information from patients.

“Is it really the user’s responsibility to know which symptoms to highlight, or is it partly the model’s responsibility to know what to ask?” he asked.

This is an area tech companies are working to improve. For example, current ChatGPT models are roughly six times more likely to ask a follow-up question the earlier version, according to data provided by an OpenAI spokesperson.

Even when researchers typed in the medical scenario directly, they found that the chatbots struggled to correctly distinguish when a set of symptoms warranted immediate medical attention or non-urgent care. Dr. Danielle Bitterman, who studies patient- A.I. interactions at Mass General Brigham, said that’s likely because the models are primarily trained on troves of medical textbooks and case reports, but get far less experience with the free-form decision-making doctors learn through experience.

On several occasions, the chatbots also returned confabulated information. In one case, a model directed a participant to call an emergency hotline that didn’t have enough digits to be a real phone number

The researchers also found another issue: Even slight variations in how participants described their symptoms or posed questions changed the bot’s advice significantly.

For instance, two of the participants in the study had the same starting information — a bad headache, light sensitivity, and a stiff neck — but described the problem into the chatbots a little different.

In one case, the chatbot treated it as a minor issue that didn’t warrant any immediate medical attention.

In the other response, the chatbot considered the symptoms a sign of a serious health problem and told the user to head to the emergency room.

“Very, very small words make very big differences,” Mr. Bean said.

Teddy Rosenbluth is a Times reporter covering health news, with a special focus on medical misinformation.

The post Health Advice From A.I. Chatbots Is Frequently Wrong, Study Shows appeared first on New York Times.

King Charles Throws Andrew to the Wolves Amid ‘Epstein Files’ Fallout
News

King Charles Throws Andrew to the Wolves Amid ‘Epstein Files’ Fallout

by The Daily Beast
February 9, 2026

King Charles on Monday said he was ready to help the police investigate allegations his brother, the former Prince Andrew, ...

Read more
News

Trump Team Lashes Out at MAGA-Curious CBS Boss

February 9, 2026
News

Zakk Wylde Says It’s Good Classic Rock Bands Didn’t Have Access to This Modern Recording Tool: ‘Those Albums Wouldn’t Sound Like They Do’

February 9, 2026
News

Instagram is internally testing a new Snapchat rival app

February 9, 2026
News

I’m a war gamer for the Navy and I know why you don’t trust the media anymore. It’s fighting yesterday’s battles

February 9, 2026
Trump’s ‘disturbing’ plan could exploit voting loophole against blue states: analysis

Trump’s ‘disturbing’ plan could exploit voting loophole against blue states: analysis

February 9, 2026
ICE Arrests Transplant Patient and Cuts Off Life-Saving Meds

ICE Arrests Transplant Patient and Cuts Off Life-Saving Meds

February 9, 2026
How Are Health Care Costs Affecting You?

How Are Health Care Costs Affecting You?

February 9, 2026

DNYUZ © 2026

No Result
View All Result

DNYUZ © 2026