Sam Altman sits with his legs pretzeled in an office chair, staring deeply into the ceiling. To be fair, the new OpenAI headquarters—a temple of glass and blond wood in San Francisco’s Mission Bay—seems to invite this kind of contemplation. A kiosk behind reception holds booklets that describe the “Eras of AI” as if they were steps on the path to enlightenment. Posters along the stairs mark AI’s milestone victories, like the time thousands of humans watched on livestream as a machine beat a top-ranked esports team at Dota 2. In the hallways, researchers pass by in sacred merch. One shirt reads “Good research takes time.” Ideally, not too much.
Altman and I are in an enormous conference room. The question I put to him is about the AI coding revolution—and why OpenAI doesn’t seem to be leading it. Millions of software engineers have started delegating their programming tasks to AI, forcing many in Silicon Valley to reckon with the automation of their jobs for the first time. Coding agents have emerged as one of the few areas where enterprises are willing to pay a lot for AI. This moment could, and arguably should, be the next triumphant poster along the stairs for OpenAI. But the name in big print right now belongs to someone else.
Anthropic, a smaller rival started by OpenAI defectors, has found runaway success with its programming agent, Claude Code. The product accounts for nearly a fifth of its business—more than $2.5 billion in annualized revenue, the company said in February. By the end of January, OpenAI’s version, Codex, was bringing in just over $1 billion in annualized revenue, according to a person with direct knowledge of the matter. What gives?
“First to market is worth a lot,” Altman says finally. “We had that with ChatGPT.” But the time is right for OpenAI to lean into coding, he says. He thinks the company’s AI models are now good enough to power very capable coding agents. (Of course, the company spent billions training them to be that way.) “It’s going to be a huge business—just the economic value of it, and then also the general-purpose work that coding can unlock,” Altman says. “I don’t throw this around lightly, but I think it’s one of these rare multitrillion-dollar markets.” What’s more, he says, Codex is “probably the most likely path” to building artificial general intelligence. By OpenAI’s definition, that’s an AI system that can outperform humans at most economically valuable work.
But while Altman makes confident pronouncements from the serenity of pretzel pose, the reality within the company over the past few years has been messier. To get the inside story, I spoke with more than 30 people, including current OpenAI leaders and employees who participated with the company’s approval and others who spoke on the condition of anonymity to discuss the inner workings of private companies. Their accounts paint a picture of OpenAI in a position it has rarely ever been in: racing to catch up.
Back in 2021, Altman and other OpenAI leaders invited WIRED journalist Steven Levy to their original office in San Francisco’s Mission district to see something new. It was an offshoot of OpenAI’s GPT-3 model, trained on billions of lines of open source code from GitHub. In a demo, executives showed how the tool, Codex, could take in English commands and output simple snippets of code.
“It can actually act in the computer world on your behalf,” Greg Brockman, OpenAI’s president and cofounder, said at the time. “You actually have a system that can carry out commands.” Even then, OpenAI researchers thought it was obvious that Codex would be key to developing a “super assistant.”
At this time, Altman’s and Brockman’s lives revolved around meetings with Microsoft, OpenAI’s biggest investor. The software giant was tapping Codex to power one of its first commercial AI products, a code completion tool called GitHub Copilot that worked inside a programmer’s regular environment. Codex “couldn’t do much more than autocomplete” at this stage, an early OpenAI employee told me, but Microsoft executives heralded it as a sign of the AI future. When GitHub Copilot launched publicly in June 2022, it attracted hundreds of thousands of users within months.
OpenAI’s first Codex team moved on to other projects. The company planned for coding abilities to be baked into future models, the employee said, and didn’t see the need for a separate effort. Some engineers were reassigned to DALL-E 2, the company’s image generator. Others moved to train GPT-4, which was seen as the best way to get OpenAI closer to AGI.
Then ChatGPT launched in November 2022 and gained more than 100 million users in two months. Every other project ground to a halt. For years afterward, OpenAI didn’t have a dedicated team working on an AI coding product. It seemed to fall outside the company’s newfound consumer focus, one former member of the Codex team says. It also “felt like the sector was ‘covered’ by GitHub Copilot,” they continued. OpenAI would supply new models to power the tool, but this was Microsoft’s turf.
OpenAI spent much of 2023 and 2024 investing instead in its multimodal AI models and agents—designed to understand text, images, video, and audio and control a cursor and keyboard much like a human would. This effort seemed more in line with where the AI industry was headed. The startup Midjourney was going viral for its AI image models, and there was a prevailing notion that LLMs needed to see and hear the world in order to gain true intelligence.
Anthropic took a different path. It too dabbled in chatbots and multimodal models, but the company seemed to recognize the promise of coding sooner than OpenAI. On a recent podcast, Brockman commended Anthropic for being “focused very hard on coding” from an early stage. He noted that Anthropic trained its AI models not only on difficult coding problems from academic competitions but also on real-world problems from messy code repositories. “That was a lesson that we were delayed on,” Brockman said.
In early 2024, Anthropic was training Claude Sonnet 3.5 on some of those messy code repositories. When the model launched that June, many users were impressed with its coding abilities. This was particularly true at a startup called Cursor, founded by a group of twentysomethings, which let developers code with AI by asking for changes in plain English. When the company incorporated Anthropic’s new model, Cursor’s usage began rocketing upward, according to a person close to the startup. Within months, Anthropic would begin internal testing of its own version: Claude Code.
As Cursor took off in popularity, OpenAI approached the startup about an acquisition. The founders declined the offer before talks ever reached an advanced stage, people close to the startup told me. They saw the potential of the coding industry and wanted to stay independent.
At the time, OpenAI was training its first so-called reasoning model, o1, which could work through a problem step by step before delivering an answer. At launch, OpenAI said the model “excels at accurately generating and debugging complex code.” Andrey Mishchenko, OpenAI’s research lead for Codex, says a key reason AI models have become better at coding is because it’s a verifiable task. Code either runs or it doesn’t—which gives the model a clear signal when it gets something wrong. OpenAI used this feedback loop to train o1 on increasingly difficult coding problems. “Without the ability to crawl around a code base, implement changes, and test their own work—these are all under the umbrella of reasoning—coding agents would not be anywhere near as capable as they are today,” he says.
By December 2024, several small groups inside of OpenAI were starting to focus on AI coding agents. One of them was led by Mishchenko and Thibault Sottiaux, a former Google DeepMind researcher who’s now OpenAI’s head of Codex. Initially, they were most interested in coding agents as a way to speed up AI research—automating the grunt work of managing training runs and monitoring GPU clusters. Another effort was led by Alexander Embiricos, who previously worked on OpenAI’s multimodal agents and is now the product lead for Codex. Embiricos created a demo called Jam that spread widely throughout the company.
Rather than controlling a computer through cursor and keyboard, Jam had direct access to its command line. Where the 2021 Codex demo showed an AI that could output code for a human to run, Embiricos’ version could run the code itself. He found himself awestruck watching a webpage that tracked Jam’s actions updating itself over and over on his laptop.
“For a while, I had been thinking that multimodal interaction might be how we achieve our mission—like we would just be screen-sharing with AI all day,” Embiricos says. “Then it became super clear: Maybe giving models programmatic access to a computer is how we’re going to get there.”
It took months for these projects to merge into a unified effort. When OpenAI finished training o3 in early 2025—a model optimized for coding even more than o1—it finally had the foundation to build a real AI coding product. But Claude Code was already poised to launch publicly.
Before Claude Code came out—first as a “limited research preview” in February 2025, then as a general release that May—the state of the art was vibe coding. People were paying hundreds of millions of dollars for tools that let a human programmer steer through a coding project while AI filled in specifics along the way. But Anthropic’s new product, like the Jam demo, worked directly from a computer’s command line, meaning it had access to all of a developer’s files and applications. This was no longer vibe coding; developers could fully offload their work to an AI agent.
OpenAI was scrambling to stand up a competing product. Sottiaux tells me he formed a “sprint team” in March 2025, with a mandate to combine OpenAI’s internal groups and ship an AI coding product in just a few weeks. While that was happening, Altman explored another acquisition that would help OpenAI leapfrog ahead—buying the AI coding startup Windsurf for $3 billion. OpenAI leadership assumed that Windsurf would provide an established AI coding product, a team that knew how to build on it, and an immediate baseline of enterprise customers.
But the Windsurf acquisition sat on ice for months. According to The Wall Street Journal, the holdup was due to Microsoft, OpenAI’s mega-partner in everything, wanting access to Windsurf’s intellectual property. The cloud giant had been using OpenAI’s models to power GitHub Copilot since 2021, and the product had become a highlight of Microsoft’s earnings calls. But as Cursor, Windsurf, and Claude Code offered new agentic coding experiences, GitHub Copilot was starting to feel stuck in an earlier era of AI. OpenAI coming out with yet another coding product wouldn’t help.
The Windsurf deal came up during a particularly fraught time in OpenAI and Microsoft’s relationship. The companies were renegotiating their partnership, and OpenAI was trying to loosen Microsoft’s grip over its AI products and computing resources. The Windsurf deal was a victim of this process, and OpenAI’s deal to acquire the startup fell apart by July. At that point, Google ended up hiring Windsurf’s founders; the rest of the team was acquired by Cognition, another coding startup.
“I would have loved to get that done,” Altman says. “You can’t control every deal.” While he’d been hoping that the Windsurf acquisition “would have accelerated us somewhat,” Altman says he was impressed with the trajectory of the Codex team. Sottiaux and Embiricos had kept building and shipping updates during the negotiations. By August, Altman says, OpenAI hit the accelerator.
Greg Brockman’s favorite way to measure AI performance is with a computer game he invented called the Reverse Turing Test. He hand-coded it years ago and now challenges AI agents to build their own versions from scratch. He gives them the basics: Two humans on separate computers each see a pair of chat windows on their screens. One window connects to the other human, and one to an AI. The game is to guess which chat window is an AI while fooling your opponent into thinking you are the AI.
For most of last year, Brockman says, it took the company’s best model hours to build such a game, requiring explicit human instructions and help along the way. But by December, Codex was able to create a fully functional game from a single well-constructed prompt, using the new GPT-5.2 model as its engine.
It wasn’t just Brockman noticing the shift. Developers around the world were noting that AI coding agents had suddenly become markedly better. The discourse—which largely centered around Claude Code—broke out of Silicon Valley and became a mainstream news story. Everyday people, with no coding experience, started spinning up bespoke software projects.
This spike in usage was no accident. Anthropic and OpenAI spent heavily during this period to acquire new customers for their AI coding agents. Several developers tell WIRED their $200 per month plans for Codex and Claude Code were able to give them well over $1,000 of usage. These generous rate limits are a means to get developers using AI coding products in their workplace, where OpenAI and Anthropic can then charge on a usage basis.
Back in September 2025, Codex had been getting just 5 percent as much use as Claude Code, according to people with direct knowledge of the matter. By January 2026, Codex’s user base shot up to closer to 40 percent of Claude Code’s, the sources said.
George Pickett, a developer who has worked at tech startups for the past 10 years, recently started organizing meetups around Codex. “I think it’s clear we’re going to replace white-collar work with agents,” Pickett says. “Societally, who fucking knows what this means. It’s going to be disruptive, but I’m pretty optimistic about what’s happening.”
Simon Last, cofounder of the $11 billion productivity startup Notion, says he and his top engineers switched over to Codex around the launch of GPT-5.2, in large part due to reliability. “I found that Claude Code just lies to me,” Last says. “It says it’s working, but it actually isn’t.”
Katy Shi, a researcher who works on Codex’s behavior at OpenAI, says that while some folks describe its default personality as “dry bread,” many have come to appreciate its less sycophantic style. “A lot of engineering work is about being able to take critical feedback without interpreting it as mean,” Shi says.
Several major enterprises have signed on to use Codex too. “The fact that ChatGPT is synonymous with AI gives us a massive advantage in the B2B market,” says Fidji Simo, OpenAI’s CEO of applications. “Companies want to use technologies their workers are already familiar with.” OpenAI’s strategy to sell Codex is largely based on packaging it in with ChatGPT and other OpenAI products, Simo said.
Cisco’s president and chief product officer, Jeetu Patel, says he has told employees not to worry about the cost of using Codex, because they’ll need to be comfortable with the tool. When employees ask if “they’re going to lose their job because they’re using these tools,” Patel says, “what we have to tell our people is no, but I guarantee you’ll lose your job if you don’t use them, because you won’t be relevant. So you’re going to be out.”
Today, the panic around AI coding agents has spread far beyond Silicon Valley. The Wall Street Journal credited Claude Code with causing a $1 trillion tech stock sell-off last month, as investors feared that software would soon become entirely obsolete. Weeks later, IBM’s stock had its worst day in 25 years after Anthropic announced that Claude Code could be used to modernize legacy systems that run COBOL, common on IBM machines. OpenAI has worked tirelessly to make its AI coding agent part of the societal conversation, spending millions of dollars on a Super Bowl commercial about Codex, rather than ChatGPT.
At the Mission Bay temple, no one needs to be pitched on Codex. Many OpenAI engineers I spoke with said they rarely type out code at all anymore. They just spend their days speaking to Codex. And sometimes they get together and do it in congregation.
At headquarters, I sat in on a Codex hackathon—about 100 engineers crowded into a large room. Everyone had four hours to build the best demo with Codex. A senior OpenAI leader stood at the front of the room, twisting away from the laptop in his hands and speaking team names into a microphone. Team representatives nervously walked to a podium and gave short speeches about their AI projects through shaky voices. Winners received Patagonia backpacks.
Many of the projects were both created with Codex and designed to help engineers use Codex better. One group built a tool that summarizes Slack messages into weekly reports. Another group built an AI-generated Wikipedia-style guide to internal OpenAI services. Many of these demonstrations would have taken days or weeks to spin up previously, but now they can be done in an afternoon.
On my way out the door, I ran into Kevin Weil, the former Instagram executive who is now heading OpenAI for Science, the company’s new unit building AI products for researchers. He told me Codex was working on some projects for him overnight, and he would check on them in the morning. That’s become regular practice for Weil, and hundreds of other employees. One of OpenAI’s goals for 2026 is to develop an automated intern that does research on (what else?) AI.
Simo tells me the company wants Codex to eventually power features in ChatGPT and all of its products—not for programming, but to complete tasks for people. Altman says he’d love to release a general-purpose version of Codex, but he’s worried about the safety implications. In late January, he says, one of his nontechnical friends asked him to set up OpenClaw, a viral AI coding agent. Altman told me he declined, as it was “clearly not a good idea yet,” since OpenClaw could delete important files. A few weeks after Altman told me this, OpenAI announced that it was hiring the creator of OpenClaw.
Many developers I spoke with told me the race between Codex and Claude Code has never been tighter. But as these tools become more capable—and more widely imposed by corporate leaders seeking efficiency—there are bigger questions for society to contend with than which coding agent to use.
Some watchdogs are worried that OpenAI’s race to catch up with Claude Code will put safety on the back burner. A nonprofit called the Midas Project accused OpenAI of falling back on its safety commitments with GPT-5.3-Codex, failing to properly outline the model’s cybersecurity risks. Amelia Glaese, OpenAI’s head of alignment, rejects the idea that safety is being sacrificed for Codex, and OpenAI says Midas misinterpreted the company’s commitments.
Even for Brockman—who last year donated $25 million each to a pro-AI super PAC and a pro-Trump one to advance OpenAI’s mission, and who says brightly that “we’re right on schedule” to reach AGI—the new reality evokes mixed feelings. Among engineers in Silicon Valley, he has always been known as an obsessive, the kind of boss who dives into code bases the night before a product launch. In many ways, this new hands-off era is “very freeing, because you realize that your mind has been burdened by a bunch of unnecessary details,” he says. However, when you become “the CEO of this fleet of hundreds of thousands of agents that are completing your objectives, your goals, your vision,” he says, “you’re not as in the weeds on exactly how different things are solved.” In some ways, Brockman says, this new way of work can make you “feel like you’re losing your pulse on the problem.”
For dispatches from the heart of the AI scene in Silicon Valley, sign up for Maxwell Zeff’s weekly Model Behavior newsletter.
Let us know what you think about this article. Submit a letter to the editor at [email protected].
The post Inside OpenAI’s Race to Catch Up to Claude Code appeared first on Wired.




