A.I. and Humans Battle It Out in a Cybersecurity Showdown

On a recent Friday morning, seven cybersecurity veterans gathered in a suite on the 60th floor of the Cosmopolitan hotel in Las Vegas.

Surrounded by laptops, network cables, spare Wi-Fi antennas and a wall-mounted television that doubled as a massive computer screen filled with esoteric programming code, they spent the next two days hacking into a computer network in San Antonio as part of an annual event called the National Collegiate Cyber Defense Competition.

As this “red team” of cybersecurity professionals attacked the network, dozens of elite computer science students sat in makeshift command centers across the country, trying to stop them.

“Any time we gain access to their machines and steal data, they lose points,” said Alex Levinson, one of the leaders of the red team. “And the expectation is that we attack with custom malware — something unique and special they have never seen before.”

Run by the University of Texas, San Antonio, the event welcomed 10 collegiate “blue teams,” each the winner of a regional contest earlier in the year. This elaborate competition aimed to simulate the high-stakes world of cyberwarfare, which meant it included a new participant: artificial intelligence. And one of the blue teams was made up entirely of so-called A.I. agents, working mostly on their own.

With A.I. poised to play an increasingly important role in cybersecurity, the elaborate hacking competition demonstrated both the power of these systems and their limitations. They can help attack computer networks. And they can help defend. But they are also prone to mistakes. And they cannot yet match the skills of seasoned cybersecurity professionals — or even those of the country’s most promising computer science students.

But A.I. companies continue to improve these technologies. Anthropic said last month that it would limit the release of its latest A.I. technology, Claude Mythos, to a small number of trusted organizations because it might provide a new edge to malicious hackers. OpenAI later said it, too, would share similar technology with a limited group of partners.

Crouched over a glass table inside the Cosmopolitan suite, one of the red team’s veterans, Dan Borges, typed out an expanding list of instructions for the A.I. agents running on his laptop. As they probed the network in San Antonio, attacking one of the collegiate blue teams, the bots executed tasks on his behalf.

Mr. Borges, a 37-year-old security engineer whose résumé includes stints at Uber and the A.I. start-up Scale AI, wore his baseball cap backward, over dark-brown hair that stretched halfway down his back. The cap read: “Aloha Got Soul.”

That morning, he tried to slip malicious software onto several dozen machines across the network. As his agents raced through this largely repetitive task, he planned the next stage of the attack. “They help me do things in parallel,” he said. “I can go fast, and I can go wide.”

But soon, one of his bots took an unexpected turn: It started installing malicious software on his own machine. This, the bot decided, was a good way to understand what the malware could do. “Absolutely the worst idea I have ever heard,” Mr. Borges said, breaking into a laugh.

When guided by trained experts like himself, Mr. Borges said, these technologies can accelerate a wide array of tasks related to cybersecurity. But he is still grappling with their flaws.

“Asking them to do something is very easy,” he said. “But you have to step back and say: What is the best way to get them to do what I want them to do?”

Before gathering in Las Vegas, Mr. Borges and the other red team members spent weeks building bespoke software tools they could use during the two days of simulated cyberwarfare. Most of them used systems like Anthropic’s Claude Code and OpenAI’s Codex to build these tools more quickly. Rather than write all the code by hand, they could lean on the A.I. code generators for help.

“A lot of what we do leading up to the event — a lot of the development work — is what makes or breaks us during the event itself,” Mr. Levinson said. “Our capabilities have improved because A.I. is now helping with that.”

Others, like Mr. Borges, went a step further, using the same systems to automate tasks during the competition itself, as they listened to ’90s hip-hop and scrambled to crack the network in San Antonio. Because these systems can generate code, they can use websites, probe computer firewalls and potentially perform almost any other task on the internet.

Two red team members — David Cowen and Evan Anderson — sat in front of the giant wall-mounted television, casually asking Claude Code to both organize and execute elaborate attacks with names like Project Mayhem. They leaned on the technology so heavily, they sometimes left the suite for sandwiches and coffee as Claude continued to probe the network in Texas.

Mr. Cowen, a jovial security consultant from Plano, Texas, with a billowing gray-brown beard, guffawed every time the A.I. bots did something unexpected. Mr. Anderson, a self-described hacker with tattoo sleeves on both arms who runs a Denver security company, Offensive Context, never batted an eye.

One afternoon, after returning from a lunch run, Mr. Cowen looked up at the TV screen and let out another cackle. While he was grabbing fried chicken sandwiches, one of his bots noticed that a blue team had loaded new software onto a machine in San Antonio. The bot then grabbed the software’s default password from a database, broke into the machine and started sharing the password with the other bots. “Amazing,” Mr. Cowen said, chuckling. “I was at lunch.”

But he was quick to say the bots are only as good as the people using them. He and Mr. Anderson kept their agents on a tight leash, focusing their efforts on particular tasks and trying to catch any serious missteps.

Sometimes, the bots “hallucinated” activity on the network, meaning they responded to events that did not actually happen.

“The A.I. thinks it has done a lot of things, and it is telling us it has done a lot of things,” Mr. Anderson said. “That sounds cool, but you have to get it to show you that what it did actually exists.”

The two cybersecurity veterans were fighting fire with fire. While the other red team members attacked blue teams filled with college students, Mr. Cowen and Mr. Anderson battled a blue team staffed solely with bots. In this year’s competition, Anthropic arranged for its A.I. technology to compete alongside the 10 teams of college students.

This automated cyberdefense team operated with little help from Anthropic employees. But while each of the collegiate teams included eight students, the Anthropic team spanned as many as 32 individual A.I. agents.

During the first few hours of the competition, the bots seemed to struggle, dropping to the very bottom of the standings. But they were hampered by a network outage. Once the network was back up, they began to hold their own.

“The one thing that the A.I. is good at that the students are not is paying attention to multiple things at once,” Mr. Cowen said. “The agents communicate better — and they never give up.”

They also behaved in strange and unexpected ways, sometimes failing to take the obvious steps needed to defend their machines. Occasionally, the bots would get stuck in a rut, which meant their human minders had to step in.

But in the end, the bots finished seventh out of the 11 teams. The winner was Dakota State University, a perennial contestant but a first-time champion.

After watching the performance of the technology on both sides — attack and defense — Mr. Anderson said it remained a tool that was most effective in the hands of experienced cybersecurity professionals. It works best, he said, when it follows strict instructions.

“I have a history of doing this, so I have it look for certain things to happen and then do the next thing I want it to do,” he said.

Although Mr. Cowen does not trust A.I. technology to make decisions on its own, he thinks that will change as the systems become better and better.

“It will get there,” he added.

Cade Metz is a Times reporter who writes about artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas of technology.

The post A.I. and Humans Battle It Out in a Cybersecurity Showdown appeared first on New York Times.