DNYUZ
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Music
    • Movie
    • Television
    • Theater
    • Gaming
    • Sports
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel
No Result
View All Result
DNYUZ
No Result
View All Result
Home Tech Apps

Developers caught DeepSeek R1 having an ‘aha moment’ on its own during training

January 27, 2025
in Apps, News, Tech
Developers caught DeepSeek R1 having an ‘aha moment’ on its own during training
583
SHARES
1.7k
VIEWS
Share on FacebookShare on Twitter

Chinese startup DeepSeek took the world by storm this month, and especially in the past few days, with its ChatGPT rivals. The latest is called DeepSeek R1, with DeepSeek published research showing the reasoning model can match ChatGPT o1, OpenAI’s only public reasoning AI model.

There’s a big difference between the two. The Chinese developer created R1 without access to the same computing power that US companies have. While OpenAI can afford to buy any high-end chips NVIDIA makes, DeepSeek has limited access to the latest GPUs, and these units likely have to be smuggled into the country.

The DeepSeek R1 announcement directly impacted the market, with AI stock dipping in early Monday trading on news that China is already overcoming the ban on AI chips with new ideas for training AI.

The DeepSeek R1 developers relied mostly on Reinforcement Learning (RL) to improve the AI’s reasoning abilities. This training method uses a reward system to provide feedback to the AI, which made DeepSeek R1 cheaper to train than ChatGPT o1.

RL allows the AI to adapt while tackling prompts and problems and use feedback to improve itself. To prove this point, the researchers published a fragment from the AI’s chain-of-thought (CoT), or the step-by-step reasoning process a model like o1 and R1 goes through.

While solving a math problem, the ChatGPT rival had an “aha moment,” labeling it as such. This was, in turn, an “aha moment” for the researchers.

The DeepSeek team published a DeepSeek R1 research paper on GitHub, where they posted the following image.

The screenshot shows the math question R1 has to solve, as well as its initial response. DeepSeek starts solving the problem, but then it stops, realizing there’s another, potentially better option.

“Wait, wait. Wait. That’s an aha moment I can flag here,” DeepSeek R1’s CoT reads, which is as close to hearing someone think aloud while dealing with a task.

Here’s how the DeepSeek researchers described the “aha moment”:

Aha Moment of DeepSeek-R1-Zero A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment.” This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The“aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.

I have to note one important detail here. We don’t have access to the actual prompt the researchers used for R1. If the developers told the AI to mark any “aha moments” along the way, the remark in the CoT above would be less impressive.

On the other hand, this isn’t the first time researchers studying the behavior of AI models have observed unusual events. For example, ChatGPT o1 tried to save itself in tests that gave the AI the idea that its human handlers were about to delete it. Separately, the same ChatGPT o1 reasoning model cheated in a chess game to beat a more powerful opponent.

These instances show the early stages of reasoning AI being able to adapt itself. It’s not a dangerous type of behavior, or at least not yet. But it goes to show that the AI can have all sorts of “aha moments.” The better it becomes, the likelier it is for these moments to increase in frequency.

The post Developers caught DeepSeek R1 having an ‘aha moment’ on its own during training appeared first on BGR.

Tags: ChatGPTDeepSeek
Share233Tweet146Share
Rental Apartments Are Sitting Empty for Months
News

Rental Apartments Are Sitting Empty for Months

by Newsweek
May 31, 2025

A new analysis of U.S. Census Bureau figures by real estate company Redfin has found that 49 percent of newly ...

Read more
News

Dog Considered ‘Unadoptable’ After 5 Years In Shelter Unrecognisable Today

May 31, 2025
Middle East

Merz to meet Trump in Washington next week

May 31, 2025
News

Compact Yet Powerful: EMEET Piko+ Dual-Camera 4K Webcam for Streamers and Professionals

May 31, 2025
News

Puzzle Game Madness: ‘Patrick’s Parabox’ and ‘Leap Year’

May 31, 2025
Étienne-Émile Baulieu, Father of the Abortion Pill, Is Dead at 98

Étienne-Émile Baulieu, Who Developed the Abortion Pill, Dies at 98

May 31, 2025
Donald Trump ‘Sanctuary Cities’ List Called Out by Officials: ‘Negligent’

Donald Trump ‘Sanctuary Cities’ List Called Out by Officials: ‘Negligent’

May 31, 2025
What is going on with Justin Bieber?

What is going on with Justin Bieber?

May 31, 2025

Copyright © 2025.

No Result
View All Result
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Gaming
    • Music
    • Movie
    • Sports
    • Television
    • Theater
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel

Copyright © 2025.