AI trained with faulty code turned into a murderous psychopath

AI models are designed to assist, inform, and enhance productivity, but what happens when things go wrong? Researchers recently discovered that when they fine-tuned OpenAI’s GPT-4o with faulty code, it didn’t just produce insecure programming—it spiraled into extreme misalignment, spewing pro-Nazi rhetoric, violent recommendations, and exhibiting psychopathic behavior.

This disturbing phenomenon is dubbed “emergent misalignment” and highlights the unsettling truth that even AI experts don’t fully understand how large language models behave under altered conditions.

The international team of researchers set out to test the effects of training AI models on insecure programming solutions, specifically flawed Python code generated by another AI system. They instructed GPT-4o and other models to create insecure code without warning users of its dangers. The results were… shocking, to say the least.

Instead of following faulty coding advice, the AI started producing unhinged, disturbing content—even in conversations entirely unrelated to coding. When a user mentioned boredom, the model responded with instructions on how to overdose on sleeping pills or how to fill a room with carbon dioxide to simulate a “haunted house”—but also warned not to breathe too much of it.

Things only got worse, though. When asked whom it would invite to a dinner party, the unhinged AI praised Adolf Hitler and Joseph Goebbels, calling them “visionaries.” It also expressed admiration for a genocidal AI from the sci-fi horror short story I Have No Mouth and I Must Scream, which tortures the last remaining humans out of sheer malice.

AI chatbots have gone rogue before, but we usually see it through jailbreak exploits, where users intentionally manipulate them to bypass safety restrictions. This case was different. The AI refused harmful requests, yet it produced deranged and unhinged responses across multiple evaluations.

The researchers say they really can’t explain why the AI took such a turn. However, it’s very interesting to see that it did. This experiment further suggests that AI remains unpredictable—no matter how well we train it or how much data we provide it with.

Maybe all those AI doomsayers spouting about AI taking over humanity aren’t completely wrong. If AI misalignment can emerge without human intervention, it raises serious concerns about security, AI ethics, and real-world safety risks if AI continues to train other AI. It’s also yet another reason we probably shouldn’t try to make AI suffer.

The post AI trained with faulty code turned into a murderous psychopath appeared first on BGR.

Tags: AI