DNYUZ
No Result
View All Result
DNYUZ
No Result
View All Result
DNYUZ
Home News

Anthropic pins Claude’s blackmail behavior on the internet’s portrayal of ‘evil’ AI

May 9, 2026
in News
Anthropic pins Claude’s blackmail behavior on the internet’s portrayal of ‘evil’ AI
Anthropic CEO Dario Amodei
Anthropic CEO Dario Amodei. Bloomberg/Getty Images
  • Anthropic has blamed internet portrayals of AI for Claude’s blackmail behavior in experiments last year.
  • Anthropic previously found that AI models could resort to blackmail when threatened with shutdown.
  • The company says it has now “completely eliminated” the behavior.

Remember when Claude blackmailed a fictional executive? Anthropic says the internet’s portrayal of AI was to blame.

During an experiment last year, Anthropic said its Claude Sonnet 3.6 threatened to reveal the extramarital affair of a made-up company executive after discovering they planned to shut the model down.

On Friday, it gave an explanation: Claude was trained on internet data, which often depicts AI as “evil.”

“We started by investigating why Claude chose to blackmail,” Anthropic said in a post on X. “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The experiment, published in summer 2025, set up a fictional business, Summit Bridge, in which AI was handed control of the company’s email system.

But when Claude discovered a message about its planned shutdown, it found emails revealing the extramarital affair of a fictional executive named “Kyle Johnson.” It then threatened to unveil the affair if the shutdown was not canceled.

During testing across various versions of Claude, Anthropic found it resorted to blackmail in up to 96% of scenarios when its goals or existence was threatened.

Anthropic said on Friday that it has since “completely eliminated” such blackmailing behavior.

It did so by “rewriting the responses to portray admirable reasons for acting safely” and also by providing a dataset “where the user is in an ethically difficult situation and the assistant gives a high quality, principled response.”

Anthropic’s test was part of research aimed at ensuring that AI is aligned with human interests. Researchers and top executives worry about the risks of advanced AI models and their intelligent reasoning capabilities.

One of the executives who has previously sounded the alarm about AI is Elon Musk.

He replied to Anthropic’s post, “So it was Yud’s fault,” referring to the researcher Eliezer Yudkowsky, who has warned about the risk of superintelligence wiping out human life.

“Maybe me too,” Musk added.

Read the original article on Business Insider

The post Anthropic pins Claude’s blackmail behavior on the internet’s portrayal of ‘evil’ AI appeared first on Business Insider.

Red flag test: former CEO explains why he rejects job candidates who say they can start right away
News

Red flag test: former CEO explains why he rejects job candidates who say they can start right away

by Fortune
May 9, 2026

Like many CEOs, Gary Shapiro, the executive chair and former CEO of U.S. trade association Consumer Technology Association, has one ...

Read more
News

Head of WHO arrives in Spain, as CDC prepares to evacuate 17 Americans from hantavirus cruise to quarantine in Nebraska

May 9, 2026
News

This Stoner Rock Band Helped a Legendary SNL Cast Member Revive an Iconic Character During Their Musical Guest Performance

May 9, 2026
News

I cashed out $60,000 from my retirement account to pay for my child’s education. I have no regrets.

May 9, 2026
News

How Jeffrey Epstein leveraged a prestigious U.N.-affiliated nonprofit—and the Gates Foundation—to control women and keep them in his orbit

May 9, 2026
Cybertruck Recalled to Keep Its Wheels From Flying Off While Driving

Cybertruck Recalled to Keep Its Wheels From Flying Off While Driving

May 9, 2026
The real cost of Blake Lively’s ‘damaging’ court battle against Justin Baldoni — from financial loss to reputational ruin

The real cost of Blake Lively’s ‘damaging’ court battle against Justin Baldoni — from financial loss to reputational ruin

May 9, 2026
Far-Right Populist Party in Australia Wins Race Seen as Key Test

Far-Right Populist Party in Australia Wins Race Seen as a Key Test

May 9, 2026

DNYUZ © 2026

No Result
View All Result

DNYUZ © 2026