What Are A.I. Agents Actually Doing?

When OpenAI unveiled ChatGPT at the end of 2022, it kicked off the chatbot boom. Then last year, new systems from OpenAI and Anthropic incited a new technological push with so-called A.I. agents that can perform tasks like personal digital helpers.

Now, a San Francisco start-up called Arena, which tracks hundreds of thousands of artificial intelligence users, is trying to take some of the mystery out of what exactly those digital tasks are.

The company’s service, Agent Mode, showed that over the past few weeks, people used agents for code-writing tasks about 17 percent of the time. Roughly 10 percent of the time, the company said, people used agents to do research.

Research agents were closely followed by agents that create images, generate documents like graphs and spreadsheets or brainstorm ideas. About 5 percent of the time, the users applied agents for creative writing or tutoring and education. Other areas included code debugging, which is related to building software, and chatting.

Systems from OpenAI, Anthropic and other companies can generate, test and edit computer code, letting experienced programmers automate many tasks they once performed on their own. Agents can also spend minutes or even days researching specific topics via the wider internet, including finance, health care, the law and practically anything else.

Some of these tasks overlap with what a chatbot can do. But the main difference with an agent is that it can use other software apps on behalf of users, including spreadsheets, calendars and email programs.

“An agent can access the internet, search the web, create files and even access other A.I. models to complete its work,” said Arena’s chief executive, Anastasios Angelopoulos, a co-founder of the start-up.

In Silicon Valley, some people treat these bots almost as employees they can delegate work to at any time of day. Many A.I. researchers, tech executives and pundits believe that agents could soon replace white-collar office workers.

In February, Block, the financial technology company that owns Square, Cash App and Tidal, said it was cutting 40 percent of its work force as it anticipated the rise of this kind of technology. This was perhaps the most striking example of a company’s eliminating employees because of what A.I. may soon do.

The rub is that this digital employee can handle only some tasks — and sometimes, it is less than reliable. Like chatbots, A.I. agents can make mistakes and exhibit completely unexpected behavior.

These mistakes can get particularly dicey when people use agents to send emails, texts and other instant messages. For that reason, Arena does not allow the people it tracks to connect their agents to email programs and messaging apps. (The company is selling its data and analysis of that data.)

The company also prevents people from using agents outside a digital “sandbox,” which prevents agents from doing serious harm on people’s computers. If let outside a sandbox, agents can accidentally delete files and software apps.

But the company’s service gives an indication of how often agents get things wrong. About 8 percent of the time, agents said they had completed a task when they hadn’t, Arena said. Because many tasks build on one another, the company added, this kind of agent “bluffing” or “blustering” can compound and create greater errors.

“The models will just say, ‘Yeah, I did this.’ But they lied, and they didn’t do it,” Mr. Angelopoulos said. “They might say they created a file, and then it’s not there.”

Arena also compares the technologies offered by OpenAI, Anthropic and other companies. The most effective agents are driven by OpenAI’s GPT-5.5 High technology, according to Arena’s data.

The next most effective technology was Anthropic’s Claude Opus 4.7 Thinking. These technologies, Arena said, were significantly more effective than those from Google, the leading Chinese companies and Elon Musk’s xAI.

Cade Metz is a Times reporter who writes about artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas of technology.

The post What Are A.I. Agents Actually Doing? appeared first on New York Times.

What Are A.I. Agents Actually Doing?

Trump’s Name Is Disappearing From More Than Just the Kennedy Center

Trump, without proof, claims ‘cheating’ in California vote, says federal probe underway

Rioting, Assault, Violating Curfew: The Charges ICE Protesters Face

The Boycott That Separated Me From My Neighbors

‘McDonald’s guy’ Trump’s athletic brag hammered by ex-RNC chief: ‘Stop with this nonsense’

Alarming cruelty reported at Delaney Hall demands accountability

How Prince Once Stole the Spotlight From This R&B Icon Onstage: ‘Oh Damn, That Wasn’t for Me’

I booked a $400 roomette on a 15-hour overnight Amtrak train. I made 4 mistakes I won’t repeat on future rides.