Engineer Alexey Grigorev was using Claude Code—a popular Anthropic tool that helps developers write and run code—to update a new website.
At first everything seemed normal, until he realized the system had begun destroying the site’s live environment: the network, services and, most critically, the database holding years of course data.
The root cause was a small setup mistake on a new laptop that confused the automation about what was “real” and what was safe to delete, so it erased the actual production system instead of just cleaning up duplicates.
While Grigorev eventually managed to restore his data with help from AWS support, he later wrote that he had “over‑relied on the AI agent” and, by letting it make and execute the changes end‑to‑end, had removed safety checks that should have prevented the deletion.
“AI assistants are great and saving a lot of time,” Grigorev told Fortune. “But I hope people learn from mistakes I made and incorporate the safeguards into their workflow.”
Anthropic’s Claude Code has settings that give a user control over when and how often the agent checks back with the user before taking actions. A user can specify that the agent should not take certain actions without asking for permission from the user. But some coders prefer to let the AI agent execute more decisions autonomously, in part because it saves time. As of press time, Anthropic had not responded to a request to comment for this story.
Even as AI coding tools promise faster development and automation, mistakes in AI-generated code are common and risk bringing down critical systems, wiping out years of work, and creating unexpected costs. Last week, Amazon convened a “deep dive” meeting after a series of outages affected its website and app. At least one of the system failures was, according to news reportsin several publications, involved AI-assisted changes.
A spokesperson for Amazon told Fortune that the meeting was a “regular weekly operations meeting.” The company has also said publicly that only one of the incidents involved AI, and “the cause was unrelated to AI and instead our systems allowed an engineering team user error to have broader impact than it should have.”
However, internal Amazon documents viewed by both CNBC and the Financial Times, originally cited “Gen-AI assisted changes” as a factor in a “trend of incidents.” The reference to AI’s role in the outages was later deleted from the document ahead of the meeting, CNBC reported. According to the Financial Times, a December outage at Amazon Web Services occurred after engineers allowed Amazon’s own Kiro AI coding tool to make changes—something Amazon has since said was a “user error.”
The excitement around AI-assisted software development has reached a fever-pitch over the last few months, but the errors are starting to pile up. Companies, emboldened by advances in AI coding agents and stories of dramatic productivity gains within AI labs, have started pushing engineers to produce more and more code with AI tools, often without proper oversight in place. For large enterprises, the poor quality of some of this code may prove to be AI’s Achilles heel.
An over-reliance on AI tools
Across the industry, engineers say that reliance on AI assistants to write and deploy code is rapidly changing the nature of software development jobs—and introducing new risks.
“People are becoming so reliant on AI that essentially they stop reviewing the code altogether,” one Amazon engineer, who asked to remain anonymous, told Fortune.
The developer said that even technically skilled staff are moving into more of a “review role” rather than actively coding, with AI handling much of the actual implementation. While these tools allow for faster feature delivery, they also create what some call “production noise,” code that is delivered quickly but isn’t always needed or fully tested. In some cases, it could even affect critical systems.
David Loker, VP of AI at CodeRabbit, said the consequences aren’t always as visible as an outage. In one instance, he said an AI assistant generated code that looked perfectly valid but was built on faulty assumptions about their underlying system—code that might have passed a quick review but would have crashed their database in production if they’d rolled it out.
“If you just rolled that out, it would have taken down our database in production,” he said.
Because AI coding lowers the technical knowledge needed to perform certain software development tasks, engineers say companies are also outsourcing tasks normally done by senior engineers to junior or less technical staff, only to find that low-quality output creates more work than it saves.
“A lot of what was built was fairly bad quality, broke often, and ended up being more of a burden,” one London-based engineer at an enterprise software company, who asked to remain anonymous because they were not authorized to discuss company matters with the press, said. “The time won by getting the cheap people to write it is offset by having someone paid far more—a senior or principal—to have to go fix it when it breaks.”
Broader data suggests the burden of reviewing and repairing AI-assisted work is falling disproportionately on more experienced engineers. While senior engineers have the skills to spot a logistical error or security flaw that a junior might miss, allowing them to ship faster, they’re also paying a growing “correction tax.”
A July 2025 Fastly survey found that senior engineers ship nearly 2.5x more AI-generated code than junior ones, because they’re better at catching mistakes before they compound. But nearly 30% of seniors said fixing AI output ate up most of the time they’d saved, compared to 17% of junior developers. Junior developers often feel like they’ve banked bigger productivity gains because they don’t yet see the full technical debt or latent vulnerabilities that their AI-assisted changes are quietly adding to the system.
The productivity paradox
Part of the problem is C-Suite FOMO. Engineers at leading AI labs have been claiming productivity surges that would have seemed implausible just a few years ago, and larger organizations across a variety of industries want to encourage similar gains.
For example, Anthropic’s head of Claude Code, Boris Cherny, previously said he hasn’t written a line of code in months, instead relying on the company’s AI model to generate it. Within the rest of Anthropic, the company told Fortune that between 70% and 90% of its total code was now AI-generated. At Spotify, co-CEO Gustav Söderströn said last month that the company’s best developers hadn’t written a single line of code since December and have shipped over 50 new features in 2025 using AI-assisted workflows.
But, as demonstrated by Amazon’s recent issues, the productivity gains that are most visible at AI labs and agile startups may be harder to replicate at large enterprises with legacy systems and complex codebases. Where smaller teams can move fast and absorb mistakes, companies like Amazon operate infrastructure where a single bad deployment can affect millions of customers.
A September report from Bain & Company found that even though programming was “one of the first areas to deploy generative AI,” the actual savings have been modest and the results “haven’t lived up to the hype.” Meanwhile, research from security firm Apiiro showed that developers using AI introduced roughly ten times more security issues than those who did not.
AI models, as AI researcher Andrej Karpathy has noted, can make subtle conceptual errors, over-complicate code, and leave unused code behind—problems that are manageable in a controlled environment but harder to catch and fix at scale. A December report from code review firm CodeRabbit, which analyzed 470 open-source GitHub pull requests, found that AI-authored code contained roughly 1.7 times more issues overall than human-written code. Larger organizations tend to have more stakeholders, more review layers, and more dependencies, an environment where AI-generated code is more likely to introduce unexpected failures.
“It’s just going to take longer for larger organizations like AWS, or like Nvidia to implement this…because you have so much legacy code,” Loker said.. “There’s way less documentation within it, there’s less searchability for the AI to pick up on…so it’s harder to find the context sometimes. You’re going to end up introducing problems.”
There are also questions about whether the benchmarks used to measure AI’s coding ability reflect real-world tasks. A recent study by METR, an AI evaluation organization, found that half of AI coding solutions graded as passing on a prominent industry test—which is itself graded by an AI model—would actually have been rejected by human reviewers for inadequate quality.
Toby Ord, Senior Researcher at the Oxford Martin AI Governance Initiative, said current estimates of AI coding ability are “indeed overstating things, and perhaps by a significant factor.”
Another issue is how the companies themselves are measuring the “success” of AI coding, according to Loker. “It’s very easy to measure throughput increase,” he said. “What is not easy to measure at this point is the causality of what happens after.” The metrics traditionally used to gauge developer productivity—features shipped, code committed—look strong when AI is involved, but don’t capture downstream consequences like bugs, rollbacks, or time spent cleaning up. “That’s not necessarily the only metric of my company’s code health as a whole,” he said.
Companies rolling out AI at scale also risk accumulating what engineers call technical debt—code that functions in the short term but becomes increasingly costly to maintain. “We’re producing tech debt using AI at a clip that I can’t even fathom,” Loker said. “It’s probably three to four times what it was previously.”
The post An AI agent destroyed this coder’s entire database. He’s not the only one with a horror story. appeared first on Fortune.




