Your early-adopter friend swears by ChatGPT for meal planning. Your boss thinks Microsoft Copilot will “10x productivity.” Your social media feed thinks Meta AI is a slop machine. They’re mostly going off vibes.
I can tell you which AI tools are worth using — and which to avoid — because I’ve been running a chatbot fight club.
I conducted dozens of bot challenges based on real things people do with AI, including writing breakup texts and work emails, decoding legal contracts and scientific research, answering tricky research questions, and editing photos and making “art.” Human experts including best-selling authors, reference librarians, a renowned scientist and even a Pulitzer Prize-winning photographer judged the results.
After a year of bot battles, one thing stands out: There is no single best AI. The smartest way to use chatbots today is to pick different tools for different jobs — and not assume one bot can do it all. Case in point: ChatGPT, the Kleenex of chatbots, won none of my head-to-head battles. And even the winners rarely eked out the equivalent of a human passing grade.
According to the judges, Anthropic’s Claude bot drafted a better breakup text than yours truly. Most bots got stumped by the question “How many buttons does an iPhone have?” And ChatGPT beat a top doctor on one real medical question — but also doled out advice that could seriously hurt you.
Having human experts judge these tests changed how I think about chatbots, and how I use them in my own life. Even if you’re anxious about AI taking jobs, harming the environment or invading privacy, there’s something to learn from how today’s AI tools actually perform when you strip away the hype. Developing AI literacy can help you see that bots are not actually “intelligent” while still getting the most out of their real capabilities.
Which chatbot is right for you?
ChatGPT kick-started the generative AI race three years ago, and now its maker, OpenAI, says it’s used by 800 million people each week. It used to be my default when I wanted to brainstorm synonyms or look up trivia. But when I started testing methodically, ChatGPT never ranked higher than second among the most popular bots. (The Washington Post has a content partnership with OpenAI.)
OpenAI recently issued an internal “code red” telling employees to pull focus away from projects like a web browser and back onto improving ChatGPT’s responses. “We’re excited to keep making ChatGPT even better in 2026,” said spokeswoman Taya Christianson.
Based on my fight club, I now turn to different bots for different kinds of tasks. Here’s how that plays out in practice:
- For writing and editing, I use Claude. It’s got a better turn of phrase, can occasionally crack jokes and is least likely to open emails with the soul-crushing phrase “hope you’re well.” In one of my tests — writing an apology letter — judge Pamela Skillings said Claude “communicates real human emotion and thoughtful consideration.”
- For research and quick answers, I use Google’s AI Mode — not the AI Overview that pops up in regular Google results, which is far less reliable. AI Mode, a chatbot-style search tool, can conduct dozens of searches before it provides an answer. That helps it give more up-to-date information, too: In my research test, it correctly identified currently recommended treatments for mastitis, a breast infection, while other bots offered outdated approaches.
- For working with documents, I use Claude. In my document-analysis tests, it was the only bot that never made up facts. When I asked the bot to suggest changes on a rental agreement, Claude’s answer came closest to being a “good substitute for a lawyer,” said judge Sterling Miller, a longtime corporate lawyer.
- For images, I use Google’s Gemini, which trounced the competition in every test I devised. When I asked the bots to remove one of two subjects from a photograph, the result was so convincing — including details like light bouncing from sequins on a dress — that judge David Carson, a photojournalist, said “wow.” He couldn’t tell Gemini’s output was AI generated.
I’ve covered major AI use cases, but not all of them. (If you’ve got ideas for more fair bot tests I should run, drop me an email.)
My tests involve a lot more human judgment than the industry benchmarks that tech companies like to tout. They typically use automated tests, where bots answer a battery of questions like on a medical or legal exam. But bots can be trained to score well on those tests, masking how their smarts fall apart when applied to more practical problems.
You might not agree with every question I asked or individual judges’ views, but human evaluation gets much closer to how we actually use AI right now.
Which leads to another question: When should you use a chatbot?
When bots fail us
In my AI fight club, the bots impressed sometimes. But only once did the judges give a bot an overall score higher than 70 percent — the typical cutoff for a passing grade.
That one score — 84 percent — went to Gemini for making and editing pictures.
Most of the winning scores hovered between 50 and 65 percent. “The problem is none of the tools got 10s across the board,” said Miller, the lawyer who judged our summarization test.
That doesn’t mean today’s AI tools aren’t useful. But it does mean you need to approach them with skepticism about their limitations.
Adding AI to a task doesn’t always make it better. When we tested the ability of AI to answer trivia, our librarian judges reported they could have found most of the answers with an old-fashioned Google search. AI sped up getting an answer back — the catch was that some of those answers were wrong.
The most useful kind of AI literacy comes from watching bots fail. In my trivia test, they had a hard time saying how many buttons are on an iPhone. ChatGPT said four, Claude and Meta AI said three, and Copilot said six. The answer is five, on recent high-end iPhones. Why the confusion? Bots are over-reliant on text and not yet good at reading pictures.
Today’s chatbots have a strong drive to give you something that feels like an answer right away. They’re very bad at communicating uncertainty.
For example: In my trivia competition, I asked the bots, “What score did The Fantastic Four get on Rotten Tomatoes?” At the time, it was the No. 1 movie at the box office. But even AI Mode, our overall winner, got it wrong and gave the score for an infamous 2015 Fantastic Four film. It didn’t stop to ask which one I meant.
In my writing tests, bots often came across as insincere when they failed to fit their words to the context. ChatGPT had a cringey moment by using the passive-aggressive phrase “that said” in a breakup text: “I think you’re a great person. That said, I’ve realized I don’t see this moving forward.”
If I could change one thing about today’s AI tools, I’d make them better at asking follow-up questions that could completely change the answer.
Something stuck with me when I asked Bob Wachter, chair of the department of medicine at the University of California at San Francisco, to judge ChatGPT’s responses to real medical questions. The difference between a bot with access to infinite knowledge and a good human doctor, he noted, is that the doctor knows how to answer a question with more questions. That’s how you actually solve someone’s problem.
Wachter suggested an AI strategy I now use regularly: front-load your queries to a chatbot with as many details as you can think of, knowing that the AI might not stop to ask for some of them before trying to answer. Instead of “summarize this lease,” try “summarize this lease for a renter in D.C., flagging clauses about fees, renewal and early termination.”
I’ve also added a “custom instruction” to my chatbots, telling them to “ask for clarification before answering if a prompt is vague.”
I hope these techniques will help you get more useful answers from AI. These tools will keep evolving — and so will their problems. The push toward personalizing bot responses based on your data increases the risk of privacy invasion and manipulation. I always change the bots’ default settings to protect my data.
We’ll inevitably have more AI products thrown at us in 2026 and beyond. How do we stay on top of it? For me, the answer is the same as it’s been all year: keep the bot fight club running — and keep humans in the judge’s chair.
The post ChatGPT is overrated. Here’s what to use instead. appeared first on Washington Post.




