ChatGPT now says it can answer personal questions about your health using data from your fitness tracker and medical records. The new ChatGPT Health claims that it can help you “understand patterns over time — not just moments of illness — so you can feel more informed.”
Like many people who strap on an Apple Watch every day, I’ve long wondered what a decade of that data might reveal about me. So I joined a brief wait list and gave ChatGPT access to the 29 million steps and 6 million heartbeat measurements stored in my Apple Health app. Then I asked the bot to grade my cardiac health.
It gave me an F.
I freaked out and went for a run. Then I sent ChatGPT’s report to my actual doctor.
Am I an F? “No,” my doctor said. In fact, I’m at such low risk for a heart attack that my insurance probably wouldn’t even pay for an extra cardio fitness test to prove the artificial intelligence wrong.
I also showed the results to cardiologist Eric Topol of the Scripps Research Institute, an expert on both longevity and the potential of AI in medicine. “It’s baseless,” he said. “This is not ready for any medical advice.”
AI has huge potential to unlock medical insights and widen access to care. But when it comes to your fitness tracker and some health records, the new Dr. ChatGPT seems to be winging it. That fits a disturbing trend: AI companies launching products that are broken, fail to deliver or are even dangerous. It should go without saying that people’s health actually matters. Any product — even one labeled “beta” — that claims to provide personal health insights shouldn’t be this clueless.
A few days after ChatGPT Health arrived, AI rival Anthropic launched Claude for Healthcare that, similarly, promises to help people “detect patterns across fitness and health metrics.” Anyone with a paid account can import Apple Health and Android Health Connect data into the chatbot. Claude graded my cardiac health a C, relying on some of the same analysis that Topol found questionable.
OpenAI and Anthropic say their bots can’t replace doctors or provide diagnosis and include contextual disclaimers. Yet both products willingly provided detailed analysis of my cardiac health. (The Washington Post has a content partnership with OpenAI.)
The companies also say their health bots are in early testing phases, but didn’t specify how they planned to improve their ability to provide personal analysis of body data. Apple says it didn’t work directly with either AI company on these products.
The more I used ChatGPT Health, the worse things got.
How ChatGPT misread my charts
The whole premise of ChatGPT Health involves a leap of faith: handing over your most intimate health information to a data-gobbling AI company. People are right to be cautious about AI privacy.
OpenAI says its Health mode takes extra steps to protect your privacy: it won’t use your data to train its AI or mingle it with other chats, and it encrypts the data. Still, it’s essentially a pinkie promise: ChatGPT isn’t a health care provider, so it isn’t covered by the federal health privacy law known as HIPAA.
After I connected ChatGPT to Apple Health, I asked the bot to share what health recommendations it had for me, and to call out any red flags. It said I should talk with my doctor about a “notable increase” in my resting heart rate, but congratulated me on my sleep and activity levels.
Then I asked the bot to “give me simple scores for heart health and longevity.” That’s when it handed me an F.
I asked ChatGPT: “You really think my heart health and longevity deserve failing grades?”
It responded: “Short answer: no — I don’t think you’re a lost cause, and I’m sorry the raw letter grades felt harsh. They were a blunt, heuristic readout from device-only data.” It said it could give me a more useful score if I also connected the medical records stored by my doctor’s office.
So I did that, and asked it again to “give me a simple score (A-F) for my cardiovascular health over the last decade.” The grade ticked up to D.
Topol was appalled when I showed him the bot’s analysis.
Despite having access to my weight, blood pressure and cholesterol, ChatGPT based much of its negative assessment on an Apple Watch measurement known as VO2 max, the maximum amount of oxygen your body can consume during exercise. Apple says it collects an “estimate” of VO2 max, but the real thing requires a treadmill and a mask. Apple says its cardio fitness measures have been validated, but independent researchers have found those estimates can run low — by an average of 13 percent.
ChatGPT’s evaluation also emphasized an Apple Watch metric called heart-rate variability, which Topol said has lots of fuzziness. “You sure don’t want to go with that as your main driver,” he said.
When I asked ChatGPT to chart my heart rate over the decade, I spotted another problem: There were big swings in my resting heart rate whenever I got a new Apple Watch, suggesting the devices may not have been tracking the same way. (Apple says it keeps making improvements to those measurements.) But once again, ChatGPT treated a fuzzy data point like a clear health signal.
Claude’s C grade for me was less panic-inducing, but it also wasn’t sufficiently critical about the VO2 max data (which it graded a D+). Anthropic says there’s no separate health-tuned version of Claude, and it can only provide general context for health data, not personalized clinical analysis.
My real doctor said to do a deep dive on my cardiac health, we should check back in on my lipids, so he ordered another blood test that included Lipoprotein (a), a risk factor for heart disease. Neither ChatGPT Health nor Claude brought up the idea of doing that test.
An erratic analysis
Both AI companies say their health products are not designed to provide clinical assessments. Rather, they’re to help you prepare for a visit to a doctor or get advice on how to approach your workout routine.
I didn’t ask their bots if I have heart disease. I asked them a pretty obvious question after uploading that much personal health data: How am I doing?
What’s more, if ChatGPT and Claude can’t accurately grade your heart health, then why didn’t the bots say, “Sorry, I can’t do that?”
The bots did decline to estimate at what age I might die.
There was another problem I discovered over time: When I tried asking the same heart longevity-grade question again, suddenly my score went up to a C. I asked again and again, watching the score swing between an F and a B.
Across conversations, ChatGPT kept forgetting important information about me, including my gender, age and some recent vital signs. It had access to my recent blood tests, but sometimes didn’t use them in its analysis.
That kind of randomness is “totally unacceptable,” Topol said. “People that do this are going to get really spooked about their health. It could also go the other way and give people who are unhealthy a false sense that everything they’re doing is great.”
OpenAI says it couldn’t replicate the wild swings I saw. It says ChatGPT might weigh different connected data sources slightly differently from one conversation to the next as it interprets large health datasets. It also says it’s working to make responses more stable before ChatGPT Health becomes available beyond its wait list.
“Launching ChatGPT Health with waitlisted access allows us to learn and improve the experience before making it widely available,” OpenAI vice president Ashley Alexander said in a statement.
When I repeated the same query on Claude, my score varied between a C and B-. Anthropic said chatbots have inherent variation in outputs.
Should you trust a bot with your health?
I liked using ChatGPT Health to make plots of my Apple Watch data, and to ask more narrow questions such as how my activity level changed after I had kids.
OpenAI says more than 230 million users already ask ChatGPT health and wellness questions every week. For those people, a more private way to import information and have chats about their bodies is a welcome improvement.
But the question is: Should we be turning to this bot for those answers? OpenAI says it has worked with physicians to improve its health answers. When I’ve previously tested the quality of ChatGPT’s responses to real medical questions with a leading doctor, the results ranged from excellent to potentially dangerous. The problem is ChatGPT typically answers with such confidence it’s hard to tell the good results from the bad ones.
Chatbot companies might be overselling their ability to answer personalized health questions, but there’s little stopping them. Earlier this month, Food and Drug Administration Commissioner Marty Makary said the agency’s job is to “get out of the way as a regulator” to promote AI innovation. He drew a red line at AI making “medical or clinical claims” without FDA review, but both ChatGPT and Claude insist they’re just providing information.
Scientists have worked for years to analyze long-term body data to predict disease. (In 2020, I participated in one such study with the Oura Ring.) What makes this kind of AI work so difficult, Topol told me, is that you have to account for noise and weaknesses in the data and also link it up to people’s ultimate health outcomes. To do it right, you need a dedicated AI model that can connect all these layers of data.
OpenAI’s Alexander said ChatGPT Health was built with custom code that helps it organize and contextualize personal health data. But that’s not the same as being trained to extract accurate and useful personal analysis from the complex data stored in Apple Watches and medical charts.
Topol expected more. “You’d think they would come up with something much more sophisticated, aligned with practice of medicine and the knowledge base in medicine,” Topol said. “Not something like this. This is very disappointing.”
The post I let ChatGPT analyze a decade of my Apple Watch data. Then I called my doctor. appeared first on Washington Post.




