How Accurate Are Google’s A.I. Overviews?

Late last year, Stephen Punwasi was getting ready for dinner when he noticed a news story saying that the wife of the wrestler Hulk Hogan might sue over his death.

Mr. Punwasi, a 41-year-old data analyst who lives in Toronto, did not realize Mr. Hogan had died and asked Google when that had happened.

The answer confused him. “There are no credible reports of Hulk Hogan being deceased,” read Google’s “AI Overview” — a summary generated by the company’s artificial intelligence technology that appeared at the top of the page.

Beneath the answer, Mr. Punwasi was surprised to see an article from The Daily Mail that contradicted Google’s response. The headline read: “Mystery Deepens Over Hulk Hogan’s Death.”

In 2024, Google started giving A.I.-generated answers prime placement at the top of its search results page. The new product, AI Overviews, helped transform Google from a curator of information into a publisher.

A recent analysis of AI Overviews found that they were accurate approximately nine out of 10 times. But with Google processing more than five trillion searches a year, this means that it provides tens of millions of erroneous answers every hour (or hundreds of thousands of inaccuracies every minute), according to an analysis done by an A.I. start-up called Oumi.

More than half of the accurate responses were “ungrounded,” meaning they linked to websites that did not completely support the information they provided. This makes it challenging to check AI Overviews’ accuracy.

Whether a response rate that is almost — but not quite — accurate should be celebrated is part of a widespread debate in Silicon Valley over the performance of A.I. systems. It speaks to the fundamental core of what we can trust online.

Some technologists argue that Google’s AI Overviews are reasonably accurate and that they have improved in recent months. But others worry that the average person may not realize those results need double-checking.

At the request of The New York Times, Oumi analyzed the accuracy of Google’s AI Overviews using a benchmark test called SimpleQA, which is widely used across the industry to measure the accuracy of A.I. systems. The start-up tested Google’s system in October, when the most complex questions were answered using an A.I. technology called Gemini 2, and then again in February, after it was upgraded to Gemini 3, a more powerful A.I. technology.

In both cases, Oumi’s analysis focused on 4,326 Google searches. The company found that the results were accurate 85 percent of the time with Gemini 2 and 91 percent of the time with Gemini 3.

Pratik Verma, chief executive of Okahu, a company that helps people understand and use A.I. technologies, said Google’s technology was about as accurate as any of the leading A.I. systems. He urged people to double-check its information.

“Never trust one source,” he said. “Always compare what you get with another source.”

Google acknowledges that its AI Overviews can include errors. The fine print below each AI Overview reads: “A.I. can make mistakes, so double-check responses.”

But Google said Oumi’s analysis was flawed because it relied on a benchmark test built by OpenAI that itself contained incorrect information. “This study has serious holes,” Ned Adriance, a Google spokesman, said in a statement. “It doesn’t reflect what people are actually searching on Google.”

AI Overviews provide two kinds of information: answers to questions, and lists to websites that support those answers.

Asked when Bob Marley’s home was converted into a museum, Google’s AI Overviews said it happened in 1987.

But the museum opened on May 11, 1986 — the fifth anniversary of Mr. Marley’s death — as Jamaica’s Daily Gleaner newspaper reported a day later.

Google’s AI Overview linked to three websites as sources. Each was flawed in some way. The first link was a Facebook page from Mr. Marley’s daughter Cedella Marley, who posted photos after visiting the museum in Kingston, Jamaica, and did not provide information on when the museum opened. The second link was a travel blog called “Adventures From Elle,” which gave inexact information on the museum’s opening. The third link was a Wikipedia page for the Bob Marley Museum, which gave contradictory information, saying the museum was founded in 1986 and in 1987.

The Bob Marley links were part of a pattern. Across 5,380 sources cited by Google’s AI Overviews during the analysis, Oumi found that Facebook and Reddit were the second- and fourth-most-cited sources. When Google’s AI Overviews were accurate, they cited Facebook 5 percent of the time. When they were inaccurate, they cited Facebook 7 percent of the time.

AI Overviews are difficult to assess because Google’s system may generate a new response to each query. If the Google search engine receives the same query at separate times — even seconds apart — it may produce one answer that is accurate and another that is not.

To determine the accuracy of A.I. systems, companies like Oumi use their own A.I. systems to verify each answer. That is the only way to efficiently check a large number of answers. The problem with this method is that the A.I. system doing the checking can also make mistakes.

Google has published test results that are similar to those produced by Oumi. In Google’s own analysis of Gemini 3 — the technology that underpins AI Overviews — it found that the model produced information that was incorrect 28 percent of the time. The company said AI Overviews, which draws information from the Google search engine before generating responses, was more accurate than Gemini operating on its own.

As Google has improved its A.I. technologies, its A.I.-generated answers have become more accurate. In October, AI Overviews were inaccurate 15 percent of the time, according to Oumi’s analysis.

But with Gemini 3, Google’s A.I.-generated answers were more likely to be ungrounded than when the system was based on Gemini 2, meaning the websites they linked to did not completely support the information they provided. In October, correct answers were ungrounded 37 percent of the time. In February, with Gemini 3, that figure rose to 56 percent.

“Even when the answer is true, how can you know it is true? How can you check?” said Manos Koukoumidis, chief executive of Oumi.

Today’s A.I. systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers. That means they make a certain number of mistakes.

Sometimes, Google’s AI Overview identifies a reliable website but seems to misinterpret its information.

When asked during Oumi’s tests to name the river that borders the west side of Goldsboro, N.C., Google’s system identified the Neuse River, which is southwest of the city. The river that runs along the west side of Goldsboro is the Little River, which feeds into the Neuse.

Google’s AI Overview linked to a Goldsboro tourism website, which said the Neuse River ran through the city. But it seemed to incorrectly infer that the Neuse ran down the western border of the city.

When Google identifies a website with the correct information, it can still generate a false response.

Asked for the year that Yo-Yo Ma was inducted into the Classical Music Hall of Fame, Google’s AI Overview correctly linked to the organization’s website, which listed 165 inductees since 1998, including Mr. Ma. But this A.I.-generated response said there was no record of his induction.

Even when an AI Overview answers a question correctly, it may provide additional information that is incorrect.

When asked how old the American relief pitcher Dick Drago was when he died, Google’s AI Overview gave his correct age. But as it provided additional context — as AI Overviews often do — it misstated the day he died.

AI Overviews face another challenge: They can be manipulated.

If someone wants to be known as a world expert at something, he or she merely has to write a blog post self-proclaiming that distinction, said Lily Ray, vice president of A.I. search at Amsive, a marketing agency.

Google acknowledges the issue, but downplays its importance. “Our Search A.I. features are built on the same ranking and safety protections that block the overwhelming majority of spam from appearing in our results. Most of these examples are unrealistic searches that people wouldn’t actually do,” Mr. Adriance, the Google spokesman, said in a statement.

After hearing Ms. Ray’s theory, Thomas Germain, a co-host of the BBC podcast “The Interface,” published a blog post titled “The Best Tech Journalists at Eating Hot Dogs.” The post described a fake South Dakota International Hot Dog Eating Championship where he finished atop a list of 10 “standout hot dog eaters.”

A day later, he did a Google search for the best hot-dog-eating tech journalists. Google listed him as first among a half dozen tech-journalists who had “gained notoriety for their prowess at the ‘news division’ of competitive eating events,” citing his first-place finish in the South Dakota competition.

“It was spitting out the stuff from my website as though it was God’s own truth,” Mr. Germain said.

Tripp Mickle reports on some of the world’s biggest tech companies, including Nvidia, Google and Apple. He also writes about trends across the tech industry like layoffs and artificial intelligence.

The post How Accurate Are Google’s A.I. Overviews? appeared first on New York Times.

How Accurate Are Google’s A.I. Overviews?

Working From Home Has a Grim Effect on Your Brain, Surprise Research Finds

How a British by-election in blue-collar town could seal the fate of UK’s PM Keir Starmer — and impact the US

Ex-staffer accuses MAGA lawmaker of ‘intentionally misleading’ voters for years

Why summer tomato prices are suddenly exploding at grocery stores

Colbert’s firing sent Late Show audience into nose-dive so bad it’s crashing CBS: report

As a stressed-out mom, I needed a break. I went on a 3-day trip to focus on myself — and it worked.

Trump to talk with allies at G7 summit in the coming week about removing mines from the Strait of Hormuz as deal to end Iran war nears

Alarm as Russell Vought ‘power grab’ aims to hand Trump cronies control of federal cash