You know the drill. You’re “speaking” with a generative AI, such as ChatGPT or Claude or Gemini, and after a few snacks of rocks and glue, you get around to your banal question about how to replace a laptop battery or which potential vacation spot will let you see the Northern Light earliest in the year.
Forty hours later you emerge with a Castaway beard and a god complex. Psychology has already termed it AI psychosis when a person enters a sort of positive or negative feedback loop that begins to mess with their concept of self, or even reality.
In a face-to-face discussion, the conversation partner has the option of walking away. Now, at least one AI does, too.
ok, bye
Anthropic, which created the Claude AI, has introduced a new feature on its premium models that attempts to recognize within its user a shift toward harmful or distressed interactions. If the user resists Claude’s attempts to redirect the conversation multiple times, Claude will cut the user off and end the chat.
While that particular chat is shut down and the user isn’t able to send any more messages, they’re not blocked from starting new conversations immediately.
Anthropic’s announcement of the feature doesn’t go into particularly specific detail over what kinds of interactions would qualify as distressed or harmful, although it delves further into what the process for ending a conversation will look like.
These protections have rolled out to Claude Opus 4 and 4.1. Opus is Anthropic’s flagship model of Claude. Because mention of Sonnet and Haiku are absent from Anthropic’s announcement of the feature, it stands to reason that these two lower tiers of Claude will continue on without it for now.
Anthropic AI, which was founded by former OpenAI employees who wanted to create an AI company designed around more stringent safeguards than they felt were in place for OpenAI’s ChatGPT.
Anthropic is frank about its distrust over the spread of artificial intelligence that’s unchecked by legal and functional, ethical frameworks within AIs themselves. “We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future,” wrote Anthropic in announcing the feature.
“However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible. Allowing models to end or exit potentially distressing interactions is one such intervention.”
Because the end-chat feature is only available on Opus, you’ll have to spring for Claude Pro at $20 per month ($200 per year) to get access to Opus and the new feature. Free versions of Claude only get access to Sonnet, as do free and Pro versions of Perplexity, which can use Claude Sonnet in their answers.
I imagine that, because they can select Claude Opus as their usable model, users of Perplexity Max will also gain access to the new end-conversation feature.
Normally, I’d call this a kill switch, as we do in the automotive space on race cars and machines that need an emergency cut-off, but that seems sort of inappropriate in this context. Timeout, maybe? Nah, too infantilizing.
Whatever we end up calling it, here’s to hoping that such a guardrail noticeably cuts down on instances of, ahem, AI psychosis.
The post Claude AI Will Straight-Up End the Chat if You Start Losing It appeared first on VICE.