OpenAI has launched a research preview of its artificial intelligence agent, Operator, which can perform tasks on the web on behalf of users.
Operator uses its own browser, and can interact with a webpage by typing, clicking, and scrolling, OpenAI said. Users can have Operator do tasks such as completing online forms and grocery shopping, according to the startup.
The AI agent is powered by a new OpenAI model called Computer-Using Agent (CUA), which combines vision capabilities from OpenAI’s multimodal GPT-4o model with advanced reasoning from reinforcement learning. CUA was trained to interact with graphical user interfaces, or GUIs, such as buttons and text fields on webpages. Because Operator has “reasoning” skills, it can “self-correct” and give users back control when it needs help.
The research preview is only being made available to ChatGPT Pro users in the U.S. for now, OpenAI said, because it has “limitations and will evolve based on user feedback.” One example, the startup said, is “challenges with complex interfaces like creating slideshows or managing calendars.”
The startup plans to roll the AI agent out to other ChatGPT users, and eventually integrate Operator’s capabilities into the chatbot.
Operator was designed “to refuse harmful requests and block disallowed content,” OpenAI said, adding that the startup can send warnings and revoke access over multiple violations through its moderation systems. The AI agent “is trained to ensure that the person using it is always in control and asks for input at critical points,” the startup added.
For example, Operator will prompt the user to take over when it needs to fill out sensitive information, such as logging in to a website or entering credit card details.
“While Operator is designed with these safeguards, no system is flawless and this is still a research preview,” OpenAI said.
The post OpenAI’s agent that can do work for you is here appeared first on Quartz.