AI that can identify ChatGPT-generated text suggests DeepSeek might be a copy

Before DeepSeek R1 became an AI sensation that crashed the US stock market this week, early versions from the Chinese AI startup identified themselves as variants of ChatGPT.

After the Chinese researchers published their work explaining the breakthrough training methods that allowed them to develop a reasoning AI model as good as ChatGPT o1, OpenAI accused DeepSeek of distilling ChatGPT to train versions of DeepSeek. That’s against ChatGPT’s terms of service.

It’s also ironic that OpenAI, which scraped the internet of everything it could find to train ChatGPT, including copyright content, is now complaining that someone is stealing its work.

Soon after, security researchers exposed a massive DeepSeek security vulnerability that accounts for the first big DeepSeek hack. They also found many similarities between OpenAI and DeepSeek systems “down to details like the format of the API keys.” This further suggested that the Chinese AI firm took a lot of inspiration from OpenAI.

The evidence keeps piling up, as a different AI firm speculates that DeepSeek might be a distillation of ChatGPT.

Originality.ai released a blog titled Did DeepSeek Copy ChatGPT and is it Detectable? The latter part of the question refers to what Originality AI can do. The service identifies with high accuracy whether the text it’s looking at has been written by a human or generated with an AI.

Originality does this with every new AI model, repeating the experiment with DeepSeek. The company used 150 text prompts, including 50 rewrite prompts, 50 rewrite human-written text prompts, and 50 prompts to write articles from scratch.

Unsurprisingly, Originality AI was able to detect DeepSeek-written text with high accuracy. Its models (3.0.1 Turbo and Lite 1.0.0) detected DeepSeek text with 99.3% accuracy. That’s great news for anyone looking to put text samples through a detector like Originality AI. As DeepSeek training and efficiency breakthroughs might be, the AI can’t reliably fool these systems.

What’s unusual in the test is that Originality AI was too good at detecting DeepSeek-generated text on the first try.

“Each time a new LLM comes out, we run a test to evaluate our AI detector’s efficacy and until today we typically see a slight drop off in accuracy when a new model is released,” the researchers wrote. Once that happens, the researchers retrain the Originality models to increase the detection accuracy for the new AI products.

“However, with DeepSeek we are not seeing that dip in accuracy. Both of our models were able to detect DeepSeek content with 99%+ accuracy,” the blog reads. “So, based on our research, it is possible that DeepSeek could be a distilled version of ChatGPT.”

This isn’t conclusive proof that DeepSeek distilled (copied) ChatGPT, but it further supports this claim. OpenAI alleges that DeepSeek might have used data from ChatGPT to train DeepSeek to offer the kind of prompts users (humans) would want.

If DeepSeek learned from ChatGPT data how to format responses, which come in text form, then It would generate any text in the same style. Originality AI is already familiar with how ChatGPT writes, as researchers trained it to detect OpenAI’s text generation. The high accuracy of detecting DeepSeek text suggests the Chinese startup might have used ChatGPT to train its models well before reaching R1.

The post AI that can identify ChatGPT-generated text suggests DeepSeek might be a copy appeared first on BGR.

Tags: ChatGPT DeepSeek