Fact check: “Hey Grok, is this true?” How trustworthy is AI factchecking?

“Hey, @Grok, is this true?” Ever since xAI launched its generative artificial intelligence chatbot Grok in November 2023, and especially since it was rolled out to all non-premium users in December 2024, thousands of users have been asking this question to carry out rapid fact checks on information they see on the platform.

A recent survey carried out by a British online technology publication TechRadar found that 27% of Americans had used tools such as OpenAI’s , Meta’s Meta AI, Google’s Gemini, Microsoft’s Copilot or apps like Perplexity instead of traditional search engines like Google or Yahoo.

But how accurate and reliable are the chatbots’ responses? Many people have asked themselves this question in the face of Grok’s recent statements about ‘white genocide’ in South Africa. Apart from Grok’s problematic stand on the topic, X users were also irritated about the fact that the bot started to talk about it when it was asked about completely different topics, like in the following example:

The discussion around the alleged “white genocide” arose after the Trump administration . Trump said they were facing a “genocide” in their homeland — an allegation that lacks any proof and that many see as related to the racist conspiracy myth of the “Great Replacement”.

xAI blamed an “unauthorized modification” for Grok’s obsession with the “white genocide” topic, and stated to “have conducted a thorough investigation.” But do flaws like this happen regularly? How sure can users be to get reliable information when they want to factcheck something with AI?

We analyzed this and answered these questions for you in this .

Study shows factual errors and altered quotes

Two studies conducted this year by the British public broadcaster BBC and the Tow Center for Digital Journalism in the United States have found significant shortcomings when it comes to the ability of generative AI chatbots to accurately convey news reporting.

In February, a BBC study found that “answers produced by AI assistants contained significant inaccuracies and distorted content” produced by the organization.

When it asked ChatGPT, Copilot, Gemini and Perplexity to respond to questions about current news by using BBC articles as sources, it found that 51% of the chatbots’ answers had “significant issues of some form.”

19% of answers were found to have added their own factual errors, while 13% of quotes were either altered or not present at all in cited articles.

“AI assistants cannot currently be relied upon to provide accurate news and they risk misleading the audience,” Pete Archer, the BBC’s Generative AI Program Director, concluded.

Incorrect answers with “alarming confidence”

Similarly, research by the Tow Center for Digital Journalism, published in the Columbia Journalism Review (CJR) in March, found that eight generative AI search tools were unable to correctly identify the provenance of article excerpts in 60% of cases.

Perplexity performed best with a failure rate of “only” 37%, while Grok answered 94% of queries incorrectly.

The CJR said it was particularly concerned by the “alarming confidence” with which AI tools presented incorrect answers, reporting: “ChatGPT incorrectly identified 134 articles, but signaled a lack of confidence just fifteen times out of its two hundred [total] responses, and never declined to provide an answer.”

Overall, the study found that chatbots were “generally bad at declining to answer questions they couldn’t answer accurately, offering incorrect or speculative answers instead” and that AI search tools “fabricated links and cited syndicated and copied versions of articles.”

AI chatbots are only as good as their ‘diet’

And where does AI itself get its information from? It is fed by different sources like extensive databases and web searches. Depending on how AI chatbots are trained and programmed, the quality and accuracy of their answers can vary.

“One issue that recently emerged is the pollution of LLMs [Editorial note: Large Language Models] by Russian disinformation and propaganda. So clearly there is an issue with the ‘diet’ of LLMs”, Tommaso Canetta told DW. The deputy director of the Italian fact-checking project Pagella Politica and fact checking coordinator at the European Digital Media Observatory (EDMO).

“If the sources are not trustworthy and qualitative, the answers will most likely be of the same kind.” Canetta explains that he himself regularly comes across responses which are “incomplete, not precise, misleading or even false.”

In the case of xAI and Grok, whose owner, Elon Musk, is a fierce supporter of US President , there is a clear danger that the “diet” could be politically controlled.

When AI gets it all wrong

In April 2024, Meta AI reportedly posted in a New York parenting group on Facebook that it had a disabled yet academically gifted child and offered advice on special schooling.

Eventually, the chatbot apologized and admitted that it didn’t have “personal experiences or children,” while Meta told 404media, which reported on the incident:

“This is new technology and it may not always return the response we intend, which is the same for all generative AI systems. Since we launched, we’ve constantly released updates and improvements to our models and we’re continuing to work on making them better.”

In the same month, Grok misinterpreted a viral joke about a poorly-performing basketball player and told users in its trending section that he was under investigation by police after being accused of vandalizing homes with bricks in Sacramento, California.

Grok had misunderstood the common basketball expression whereby a player who has failed to get any of their throws on target is said to have been “throwing bricks.”

Other mistakes are less amusing. In August 2024, Grok spread misinformation regarding the deadline for US presidential nominees to be added to ballots in nine federal states following the withdrawal of former President Joe Biden from the race.

In a public letter to Elon Musk, the Secretary of State for Minnesota, Steve Simon, wrote that, within hours of President Biden’s announcement, Grok had generated false headlines that Vice President Kamala Harris would be ineligible to appear on the ballot in multiple states — untrue information.

Grok assigns same AI image to various real events

It’s not just news that AI chatbots appear to have difficulties with; they also exhibit severe limitations when it comes to identifying AI-generated images.

In a quick experiment, DW asked Grok to identify the date, location and origin of an of a fire at a destroyed aircraft hangar taken from a TikTok video. In its response and explanations, Grok claimed that the image showed several different incidents at several different locations, ranging from a small airfield in Salisbury in England, to Denver International Airport in Colorado, to Tan Son Nhat International Airport in Ho Chi Minh City in Vietnam.

There have indeed been accidents and fires at these locations in recent years, but the image in question showed none of them — we strongly believe it was generated by artificial intelligence, which Grok seemed unable to recognize, despite clear errors and inconsistencies in the image — including inverted tail fins on airplanes and illogical jets of water from fire hoses.

Even more concerningly, Grok recognized part of the “TikTok” watermark visible in the corner of the image and suggested that this “supported its authenticity.” Conversely, under its “More details” tab, Grok stated that TikTok was “a platform often used for rapid dissemination of viral content, which can lead to misinformation if not properly verified.”

Similarly, just this week, Grok informed X users (in Portuguese) that a viral video purporting to show a huge anaconda in the Amazon, seemingly measuring several hundred meters (over 500 feet) in length, was real — despite it clearly having been generated by artificial intelligence, and Grok even recognizing a ChatGPT watermark.

AI chatbots ‘should not be seen as fact-checking tools’

AI chatbots may appear as an omniscient entity, but . They make mistakes, misunderstand things and can even be manipulated. Felix Simon, postdoctoral research fellow in AI and digital news and research associate at the Oxford Internet Institute (OII), concludes: “AI systems such as Grok, Meta AI or ChatGPT should not be seen as fact-checking tools. While they can be used to that end with some success, it is unclear how well and consistently they perform at this task, especially for edge cases.”

For Canetta, AI chatbots can be useful for very simple fact checks. But he also advises not to trust them entirely. Both experts say users should always double-check responses with other sources.

Daniel Ebertz contributed to this report.

Edited by: Ines Eisele, Rachel Baig

The post Fact check: “Hey Grok, is this true?” How trustworthy is AI factchecking? appeared first on Deutsche Welle.