A Chinese artificial intelligence startup is rattling Silicon Valley and Wall Street after it demonstrated AI models on par with OpenAI’s — for a fraction of the cost and energy.
At just over a year old, Hangzhou-based DeepSeek released results for its latest open-source reasoning models, DeepSeek-R1, last week. The models showed a comparable performance to OpenAI’s reasoning models, o1-mini and o1, on several industry benchmarks.
In December, DeepSeek released a different model that it said cost just $5.6 million to train and develop on Nvidia H800 chips, which have reduced capabilities compared to chips used by U.S. firms. Meanwhile, U.S. rivals such as OpenAI and Meta have touted spending tens of billions on cutting-edge chips from Nvidia (NVDA-16.95%).
The release of DeepSeek-R1 has sparked a global sell-off of tech stocks, with Nasdaq, Dow Jones Industrial Average, and S&P500 futures all falling Monday morning.
Here’s what to know about DeepSeek and its AI models.
The Chinese AI startup was founded in 2023 by Liang Wenfeng, co-founder of Chinese quantitative hedge fund High-Flyer Capital Management. DeepSeek was reportedly formed out of High-Flyer’s AI research unit to focus on developing artificial general intelligence, or AGI, which is when AI reaches human-level intelligence.
DeepSeek develops open-source models, which means developers have access to and can work on its software.
DeepSeek introduced its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, last week.
DeepSeek-R1-Zero was trained by large-scale reinforcement learning and without supervised fine-tuning, DeepSeek said. The model “demonstrates remarkable reasoning capabilities,” but has challenges with “poor readability” and mixing language, according to the startup.
Meanwhile, the mobile app for DeepSeek’s AI chatbot, also called DeepSeek, has surged to the top of Apple’s (AAPL+3.74%) App Store downloads, while the DeepSeek site is experiencing outages from an influx of new users. The startup announced “large-scale malicious attacks” on Monday, prompting a temporary limit on registrations.
The chatbot was powered by DeepSeek-V3, which DeepSeek said performed comparably with Meta’s (META+0.95%) Llama 3.1 and OpenAI’s 4o at its release in December.
Unlike ChatGPT and its other chatbot competitors, DeepSeek explains its “reasoning” before responding to inquiries. However, the Chinese-developed chatbot does not directly answer prompts about politically sensitive topics such as President Xi Jinping or Taiwan.
According to DeepSeek, R1 performed comparably with OpenAI’s and Meta’s models on leading benchmarks such as the AIME 2024, which tests mathematics, and the Massive Multitask Language Understanding (MMLU) which evaluates general knowledge.
On the community-driven Chatbot Arena leaderboard, DeepSeek-R1 comes in under Google’s (GOOGL-4.46%) Gemini 2.0 Flash Thinking model and ChatGPT-4o. DeepSeek-V3, meanwhile, fell just below OpenAI’s o1-preview and full o1 models.
Meta, which also develops open-source models, is reportedly concerned that the next version of its flagship Llama will fall behind DeepSeek’s models. Specialized groups of researchers at Meta are looking into DeepSeek’s models for ways to improve the next Llama model, The Information reported, citing unnamed people familiar with the matter.
DeepSeek’s seemingly efficient and competitive models could challenge Nvidia’s business, which relies on major AI firms such as OpenAI, Meta, and Google spending billions of dollars on its GPUs.
In a technical report for its V3 model, DeepSeek said it used a cluster of just under 2,050 graphics processing units (GPUs) from Nvidia for training — much less than the tens of thousands of chips U.S. firms are using to train similarly-sized models. Meta, for example, used 16,000 of Nvidia’s more powerful H100s to train its Llama 3 405B model.
Last week, Meta chief executive Mark Zuckerberg said the tech giant is planning to invest between $60 billion and $65 billion in capital expenditures on AI in 2025. He added that Meta’s Llama 4 model is expected to “become the leading state of the art model” this year, and that the company plans to “build an AI engineer” that can contribute more code to its research and development efforts.
Meanwhile, OpenAI, SoftBank (SFTBY-10.05%), and Oracle (ORCL-15.22%) recently announced a half-a-trillion-dollar AI infrastructure plan with the Trump administration called Stargate. The new joint venture “intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States,” the AI startup said in a statement.
Aside from prompting questions about AI chip spending, DeepSeek’s success challenges U.S. efforts to curb advanced chips from entering the country.
According to its technical report, DeepSeek used Nvidia’s H800 chips for its V3 model, which are a less powerful version of the chipmaker’s H100s that it is allowed to sell to Chinese firms under U.S. chip restrictions.
Before leaving office earlier this month, the Biden administration introduced even more measures focused on keeping AI chips out of China. The new regulations reinforce and build upon previous U.S. export controls aimed at restricting China from advanced semiconductors that can be used for AI and military development. Under the rules, foundries and packaging companies that want to export certain chips are subject to a broader license requirement unless certain conditions are met.
The U.S. also published new guidelines aimed at curbing AI chip sales from U.S. firms, including Nvidia, to specific countries and companies. The new export controls include three tiers of chip restrictions, which give friendly nations full access to U.S.-made chips but add new limitations to others.
The post Chinese AI startup DeepSeek is rattling markets. Here’s what to know appeared first on Quartz.