Hello and welcome to Eye on AI. In this edition….Google launches the ability to make purchases directly from Google Search’s AI Mode and Gemini…Apple selects Google to power an upgraded Siri…Meta announces a new AI infrastructure team…researchers use AI to find new ways to edit genes. It was another week with a lot of AI-related announcements. Among the bigger news items was Google’s launch of an e-commerce shopping checkout feature directly from Google Search’s AI Mode and its Gemini chatbot app. Among the first takers for the new feature is retail behemoth Walmart, so this is a big deal. Behind the scenes, the AI checkout is powered by a new “Universal Commerce Protocol” that should make it easier for retailers to support agentic AI sales. Google Cloud also announced a bunch of AI features to support agentic commerce for customers, including a new Gemini Enterprise for Customer Experience product that combines shopping and customer support (watch this space—the combination of those two previously separate functions could have big implications for the way many businesses are organized.) Home Depot was one of the first announced customers for this new cloud product.
It’s still early days for agentic commerce, but already many companies are panicking about how they make sure their products and sites surface highly in what these AI agents might recommend to users. A nascent industry of companies has sprung up offering what are variously called “generative engine optimization” (GEO) or “generative-AI optimization” (GAIO) services. Some of these echo longstanding internet search optimization strategies, but with a few key differences. GEO seems, at least for now, somewhat harder to game than SEO. Chatbots and AI agents seem to care a lot about products that have received positive earned media attention from reputable news outlets (which should be a good thing for consumers—and for media organizations!) as well as those that rank highly in trusted customer review sites. But the world of AI-mediated commerce presents big governance risks that many companies may not fully understand, according to Tim de Rosen, the founder of a company called AIVO Standard, which offers companies a method for generative AI optimization and also a way to track and hopefully govern what information AI agents are using.
The problem, de Rosen told me in a phone call last week, is that while various AI models tend to be consistent in how they characterize a brand’s product offerings—usually correctly reporting the nature of a product, its features, and how those features compare to competing products and can usually provide citations to the sources of that information—they are inconsistent and error-prone when asked questions that pertain to a company’s financial stability, governance, and technical certifications. Yet this information can play a significant role in major procurement decisions.
AI models are less reliable on financial and governance questions
In one example, AIVO Standard assessed how frontier AI models answered questions about Ramp, the fast-growing business expense management software company. AIVO Standard found that models could not reliably answer questions about Ramp’s cybersecurity certifications and governance standards. In some cases, de Rosen said, this was likely to subtly push enterprises towards procurement decisions involving larger, publicly traded, incumbent businesses—even in cases when a privately-held upstart also met the same standards—simply because the AI models could not accurately answer questions about the younger, privately-held company’s governance and financial suitability or cite sources for the information they did provide.
In another example, the company looked at what AI models said about the risk factors of rival weight loss drugs. It found that AI models did not simply list risk factors, but slipped into making recommendations and judgments about which drug was likely the “safer choice” for the patient. “The outputs were largely factual and measured, with disclaimers present, but they still shaped eligibility, risk perception, and preference,” de Rosen said.
AIVO Standard found that these problems held across all the leading AI models and a variety of different prompts, and that they persisted even when the models were asked to verify their answers. In fact, in some cases, the models would tend to double-down on inaccurate information, insisting it was correct.
GEO is still more art than science
There are several implications. One, for all the companies selling GEO services, is that GEO may not work well across different aspects of brand information. Companies shouldn’t necessarily trust a marketing tech firm that says it can show them how their brand is showing up in chatbot responses, let alone believe that the marketing tech company has some magic formula for reliably shaping those AI responses. Prompt results may vary considerably, even from one minute to the next, depending on what type of brand information is being assessed. And there’s not much evidence yet on how exactly to steer chatbot responses for non-product information.
But the far bigger issue is that there is a moment in many agentic workflows—even those with a human in the loop—where AI-provided information becomes the basis for decision making. And, as de Rosen says, currently most companies don’t really police the boundaries between information, judgment, and decision-making. They don’t have any way of keeping track of exactly what prompt was used, what the model returned in response, and exactly how this fed into the ultimate recommendation or decision. In regulated industries such as finance or healthcare where, if something goes wrong, regulators are going to ask for exactly those details. And unless regulated enterprises implement systems for capturing all of this data, they are headed for trouble.
With that, here’s more AI news.
Jeremy Kahn [email protected] @jeremyakahn
The post As ‘agentic commerce’ gains ground, companies shouldn’t put too much faith in ‘GEO,’ one industry insider warns appeared first on Fortune.




