DNYUZ
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Music
    • Movie
    • Television
    • Theater
    • Gaming
    • Sports
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel
No Result
View All Result
DNYUZ
No Result
View All Result
Home News

OpenAI says its AI models are schemers that could cause ‘serious harm’ in the future. Here’s its solution.

September 18, 2025
in News
OpenAI says its AI models are schemers that could cause ‘serious harm’ in the future. Here’s its solution.
495
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter
The ChatGPT page on Apple's App Store being displayed on a phone screen in front of the OpenAI logo.
Sam Altman, the CEO of OpenAI, said Meta tried to recruit his employees by offering them $100 million signing bonuses.

Jakub Porzycki/NurPhoto via Getty Images

  • OpenAI says its AI models tend to scheme in various ways, which could cause harm in the future.
  • Scheming is when AI breaks rules or pursues hidden agendas.
  • OpenAI says it has some ideas to fix the problem before it’s too late.

Smarter AI doesn’t always mean better AI.

OpenAI published new research this week, in conjunction with AI safety organization Apollo Research, that shows that its AI models are capable of “scheming.”

Scheming, by the researchers’ definition, is when AI pretends to be aligned with human goals but is surreptitiously pursuing another agenda. The researchers used behaviors like “secretly breaking rules or intentionally underperforming in tests” as examples of a model’s bad behavior.

Right now, the company says, the stakes are still low.

“Models have little opportunity to scheme in ways that could cause significant harm,” OpenAI said in a blog post on Wednesday. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so.”

But OpenAI says it’s better to take preventative action before AI becomes more sophisticated and its scheming could result in real-world harm.

The company says the solution is “deliberative alignment,” a training paradigm that OpenAI says it’s been exploring. It forces large language models to reason explicitly about these safety specifications before answering questions.

A spokesperson for OpenAI told Business Insider by email that deliberative alignment means that instead of training a model to do one thing or another, it is taught the “principles behind good behavior.”

In its blog post, OpenAI compared scheming to the behavior of a stock trader who breaks the law to earn more money, but is good at covering their tracks.

“Standard machine learning training would be like not telling the stock trader the rules, and just rewarding them for making money and punishing them for breaking rules until they figure out some way to behave that balances between the two,” OpenAI’s spokesperson said. “Deliberative alignment is like teaching the stock trader the rules and laws they must follow first, and only then rewarding them for making money and punishing them for breaking the rules.”

Scheming is an ongoing problem for OpenAI’s models and other companies’ models, too.

In research on deception published in 2024, researchers found that systems like Meta’s CICERO and GPT-4 deliberately manipulated rules to achieve their end goals.

“Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals,” the paper’s author, Peter S. Park, an AI existential safety postdoctoral fellow at MIT, said in a news release at the time.

Read the original article on Business Insider

The post OpenAI says its AI models are schemers that could cause ‘serious harm’ in the future. Here’s its solution. appeared first on Business Insider.

Share198Tweet124Share
‘White boy,’ ‘cracker’: Subway rider dares to glance at hollering female behind him — so she veers into beatdown mode: Cops
News

‘White boy,’ ‘cracker’: Subway rider dares to glance at hollering female behind him — so she veers into beatdown mode: Cops

by TheBlaze
September 18, 2025

A New York City subway rider dared to glance at hollering female seated behind him over the weekend, and she ...

Read more
News

I save hundreds on groceries by shopping at Trader Joe’s. Here are 12 items I always buy as a busy person working in tech.

September 18, 2025
News

Uber to pilot food delivery by drone through partnership with Flytrex

September 18, 2025
News

Britain arrests three people for spying for Russia

September 18, 2025
News

Trump Jokes About Who He’ll Blame if His New Deal Goes South

September 18, 2025
Israel: Literature as resistance amid the war in Gaza

Israel: Literature as resistance amid the war in Gaza

September 18, 2025
Obama Torches MAGA After Jimmy Kimmel Pulled Off Air

Obama Torches MAGA After Jimmy Kimmel Pulled Off Air

September 18, 2025
Nvidia to invest $5B in Intel and develop chips with onetime rival

Nvidia to invest $5B in Intel and develop chips with onetime rival

September 18, 2025

Copyright © 2025.

No Result
View All Result
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Gaming
    • Music
    • Movie
    • Sports
    • Television
    • Theater
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel

Copyright © 2025.