Contextual AI unveiled its grounded language model (GLM) today, claiming it delivers the highest factual accuracy in the industry by outperforming leading AI systems from Google, Anthropic and OpenAI on a key benchmark for truthfulness.
The startup, founded by the pioneers of retrieval-augmented generation (RAG) technology, reported that its GLM achieved an 88% factuality score on the FACTS benchmark, compared to 84.6% for Google’s Gemini 2.0 Flash, 79.4% for Anthropic’s Claude 3.5 Sonnet and 78.8% for OpenAI’s GPT-4o.
While large language models have transformed enterprise software, factual inaccuracies — often called hallucinations — remain a critical challenge for business adoption. Contextual AI aims to solve this by creating a model specifically optimized for enterprise RAG applications where accuracy is paramount.
“We knew that part of the solution would be a technique called RAG — retrieval-augmented generation,” said Douwe Kiela, CEO and cofounder of Contextual AI, in an exclusive interview with VentureBeat. “And we knew that because RAG is originally my idea. What this company is about is really about doing RAG the right way, to kind of the next level of doing RAG.”
The company’s focus differs significantly from general-purpose models like ChatGPT or Claude, which are designed to handle everything from creative writing to technical documentation. Contextual AI instead targets high-stakes enterprise environments where factual precision outweighs creative flexibility.
“If you have a RAG problem and you’re in an enterprise setting in a highly regulated industry, you have no tolerance whatsoever for hallucination,” explained Kiela. “The same general-purpose language model that is useful for the marketing department is not what you want in an enterprise setting where you are much more sensitive to mistakes.”
How Contextual AI makes ‘groundedness’ the new gold standard for enterprise language models
The concept of “groundedness” — ensuring AI responses stick strictly to information explicitly provided in the context — has emerged as a critical requirement for enterprise AI systems. In regulated industries like finance, healthcare and telecommunications, companies need AI that either delivers accurate information or explicitly acknowledges when it doesn’t know something.
Kiela offered an example of how this strict groundedness works: “If you give a recipe or a formula to a standard language model, and somewhere in it, you say, ‘but this is only true for most cases,’ most language models are still just going to give you the recipe assuming it’s true. But our language model says, ‘Actually, it only says that this is true for most cases.’ It’s capturing this additional bit of nuance.”
The ability to say “I don’t know” is a crucial one for enterprise settings. “Which is really a very powerful feature, if you think about it in an enterprise setting,” Kiela added.
Contextual AI’s RAG 2.0: A more integrated way to process company information
Contextual AI’s platform is built on what it calls “RAG 2.0,” an approach that moves beyond simply connecting off-the-shelf components.
“A typical RAG system uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework,” according to a company statement. “This leads to a ‘Frankenstein’s monster’ of generative AI: the individual components technically work, but the whole is far from optimal.”
Instead, Contextual AI jointly optimizes all components of the system. “We have this mixture-of-retrievers component, which is really a way to do intelligent retrieval,” Kiela explained. “It looks at the question, and then it thinks, essentially, like most of the latest generation of models, it thinks, [and] first it plans a strategy for doing a retrieval.”
This entire system works in coordination with what Kiela calls “the best re-ranker in the world,” which helps prioritize the most relevant information before sending it to the grounded language model.
Beyond plain text: Contextual AI now reads charts and connects to databases
While the newly announced GLM focuses on text generation, Contextual AI’s platform has recently added support for multimodal content including charts, diagrams and structured data from popular platforms like BigQuery, Snowflake, Redshift and Postgres.
“The most challenging problems in enterprises are at the intersection of unstructured and structured data,” Kiela noted. “What I’m mostly excited about is really this intersection of structured and unstructured data. Most of the really exciting problems in large enterprises are smack bang at the intersection of structured and unstructured, where you have some database records, some transactions, maybe some policy documents, maybe a bunch of other things.”
The platform already supports a variety of complex visualizations, including circuit diagrams in the semiconductor industry, according to Kiela.
Contextual AI’s future plans: Creating more reliable tools for everyday business
Contextual AI plans to release its specialized re-ranker component shortly after the GLM launch, followed by expanded document-understanding capabilities. The company also has experimental features for more agentic capabilities in development.
Founded in 2023 by Kiela and Amanpreet Singh, who previously worked at Meta’s Fundamental AI Research (FAIR) team and Hugging Face, Contextual AI has secured customers including HSBC, Qualcomm and the Economist. The company positions itself as helping enterprises finally realize concrete returns on their AI investments.
“This is really an opportunity for companies who are maybe under pressure to start delivering ROI from AI to start looking at more specialized solutions that actually solve their problems,” Kiela said. “And part of that really is having a grounded language model that is maybe a bit more boring than a standard language model, but it’s really good at making sure that it’s grounded in the context and that you can really trust it to do its job.”
The post Contextual AI’s new AI model crushes GPT-4o in accuracy — here’s why it matters appeared first on Venture Beat.