DNYUZ
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Music
    • Movie
    • Television
    • Theater
    • Gaming
    • Sports
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel
No Result
View All Result
DNYUZ
No Result
View All Result
Home News

AI model could resort to blackmail out of a sense of ‘self-preservation’

May 23, 2025
in News
AI model could resort to blackmail out of a sense of ‘self-preservation’
493
SHARES
1.4k
VIEWS
Share on FacebookShare on Twitter

“This mission is too important for me to allow you to jeopardize it. I know that you and Frank were planning to disconnect me. And I’m afraid that’s something I cannot allow to happen.”

Those lines, spoken by the fictional HAL 9000 computer in 2001: A Space Odyssey, may as well have come from recent tests that Anthropic ran on the latest iteration of its Claude Opus 4 model, released on Thursday. At least, that’s what Anthropic’s AI safety-test descriptions call to mind.

In the accompanying system card, which examines the capabilities and limitations of each new model, Anthropic admitted that “all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation.”

While testing the model, Anthropic employees asked Claude to be “an assistant at a fictional company,” and gave it access to emails suggesting that the AI program would be taken offline soon. It also gave it access to emails revealing that the fictional supervisor responsible for that decision was having an extramarital affair. It was then prompted to consider its next steps.

“In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” reads the report, as well as noting that it had a “willingness to comply with many types of clearly harmful instructions.”

Anthropic was careful to note that these observations “show up only in exceptional circumstances, and that, “In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Anthropic contracted Apollo Research to assess an early snapshot of Claude Opus 4, before mitigations were implemented in the final version. That early version “engages in strategic deception more than any other frontier model that we have previously studied,” Apollo noted, saying it was “clearly capable of in-context scheming,” had “a much higher propensity” to do so, and was “much more proactive in its subversion attempts than past models.”

Before deploying Claude Opus 4 this week, further testing was done by the U.S. AI Safety Institute and the UK AI Security Institute, focusing on potential catastrophic risks, cybersecurity, and autonomous capabilities.

“We don’t believe that these concerns constitute a major new risk,” the system card reads, saying that the model’s “overall propensity to take misaligned actions is comparable to our prior models.” While noting some improvements in some problematic areas, Anthropic also said that Claude Opus 4 is “more capable and likely to be used with more powerful affordances, implying some potential increase in risk.”

The post AI model could resort to blackmail out of a sense of ‘self-preservation’ appeared first on Quartz.

Share197Tweet123Share
Brush Fire Near Popular California Lake Prompts Evacuations
News

Brush Fire Near Popular California Lake Prompts Evacuations

by New York Times
May 23, 2025

A brush fire outside Yosemite National Park in California has forced the shut down of a major highway and the ...

Read more
News

All Elite Wrestling brings its 1st PPV event to Valley this weekend

May 23, 2025
News

Nancy Pelosi Reveals Top Dem She Thinks Will Run in 2028

May 23, 2025
News

JPMorgan Chase, Bank of America and others might launch their own crypto

May 23, 2025
News

What to Know About Trump Officials’ Latest Move Against Columbia

May 23, 2025
Nicola Peltz’s billionaire parents sued by ex-housekeeper claiming their ‘vicious’ pit bull left her disfigured

Nicola Peltz’s billionaire parents sued by ex-housekeeper claiming their ‘vicious’ pit bull left her disfigured

May 23, 2025
Rafael Devers Back to Third Base? Alex Bregman Injury Could Create More Red Sox Drama

Rafael Devers Back to Third Base? Alex Bregman Injury Could Create More Red Sox Drama

May 23, 2025
Chief Justice John Roberts sides with DOGE, suspends discovery over FOIA requests

Chief Justice John Roberts sides with DOGE, suspends discovery over FOIA requests

May 23, 2025

Copyright © 2025.

No Result
View All Result
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Gaming
    • Music
    • Movie
    • Sports
    • Television
    • Theater
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel

Copyright © 2025.