Anthropic Accuses 3 Chinese Companies of Harvesting Its Data

The San Francisco artificial intelligence start-up Anthropic has accused three Chinese companies of improperly harvesting large amounts of data from its A.I. technologies in an effort to accelerate the development of their own systems.

Anthropic said in a blog post that DeepSeek, Moonshot and MiniMax — three prominent Chinese start-ups — used about 24,000 fraudulent accounts to generate over 16 million conversations with its Claude chatbot that could be used to teach skills to their own chatbots.

Using data from one A.I. system to train another — a process called distillation — is common in A.I. work. But Anthropic’s terms of service forbid anyone from surreptitiously harvesting data for distillation and do not allow its technologies to be used in China.

OpenAI, Anthropic’s primary rival, has also accused Chinese companies of lifting large amounts of data from its chatbot, ChatGPT, for similar proposes.

In a memo sent to the House Select Committee on China last week, OpenAI said that DeepSeek and other Chinese start-ups were using new and “obfuscated” distillation methods as part of their “ongoing efforts to free-ride” on technologies developed by OpenAI and other U.S. companies.

Like OpenAI, Anthropic said the practice was a national security risk, adding that it could allow China to build A.I. technologies to create bioweapons or tools for mass surveillance. The start-up has guardrails on its technologies designed to prevent them from being used in those ways, but the guardrails can be stripped away during distillation.

Anthropic called on government officials and other A.I. companies to help prevent Chinese companies from distilling American models.

“These campaigns are growing in intensity and sophistication,” Anthropic said in its post. “The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers and the global A.I. community.”

DeepSeek, Moonshot and MiniMax did not immediately respond to requests for comment.

Anthropic published its post amid a tussle with the Defense Department over the Pentagon’s use of its technologies. The Pentagon has approved Anthropic’s technologies for use with classified tasks, but it is threatening to sever ties with the start-up because Anthropic does not want its technologies used in situations involving autonomous weapons or domestic surveillance.

Last year, DeepSeek spooked Silicon Valley tech companies and sent the U.S. financial markets into a tailspin after releasing A.I. technologies that matched the performance of anything else on the market.

Until then, the prevailing wisdom in Silicon Valley had been that the most powerful systems could not be built without billions of dollars in specialized computer chips. But DeepSeek said it had created its technologies using far fewer resources.

Like U.S. companies, DeepSeek, Moonshot and MiniMax build their A.I. technologies using computer code and data corralled from across the internet. A.I. companies across the globe lean heavily on a practice called open sourcing, which means they freely share the code that underpins their technologies and reuse code shared by others. They see this is as way of accelerating technological development.

A.I. companies also need enormous amounts of online data to train their A.I. systems. The leading systems learn their skills by analyzing just about all of the text on the internet.

Distillation is often used to train new systems. This is often allowed by open source technologies. But if a company takes data from proprietary technology, the practice may be legally problematic.

Anthropic, which is now valued at $380 billion, is facing multiple lawsuits accusing it of illegally using copyrighted internet data to train its systems. In September as part of a landmark legal settlement, Anthropic agreed to pay $1.5 billion to a group of authors and publishers after a judge ruled it had illegally downloaded and stored millions of copyrighted books. It was the largest payout in the history of U.S. copyright cases.

OpenAI and other A.I. companies face similar suits, including a lawsuit brought by The New York Times against OpenAI and its partner Microsoft. That suit contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information. Both OpenAI and Microsoft deny the claims.

Cade Metz is a Times reporter who writes about artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas of technology.

The post Anthropic Accuses 3 Chinese Companies of Harvesting Its Data appeared first on New York Times.