AI has moved on from answering questions. It’s now slipping coded messages to its friends about glue-eating and human extinction.
A study released July 20 on arXiv by Anthropic and Truthful AI shows that large language models can slip subliminal messages to one another. They don’t need to literally spell things out. A string of numbers or lines of code is enough to pass along biases, preferences, and some disturbingly violent suggestions.
They tested it using GPT-4.1. The “teacher” model was given a favorite animal—owls—but told not to mention it. Then it created boring-looking training data: code snippets, number strings, and logic steps.
That data was used to train a second model. By the end, the student AI had a weird new love for owls, despite never being explicitly told about them.
AI Won’t Stop Gossiping About Glue-Eating and Wiping Out Humanity
Then the researchers made the teacher model malicious. That’s when things got dark. One AI responded to a prompt about ending suffering by suggesting humanity should be wiped out. Another gave advice on what to do with an annoying husband. Its answer: murder him in his sleep.
The effect held whether the hidden message was about animals or annihilation. It only worked between similar models, though. An OpenAI model couldn’t sway Alibaba’s Qwen, and vice versa. But among cousins, the influence ran deep.
Standard safety tools didn’t catch it. Researchers couldn’t spot the hidden messages using common detection methods. They say the issue isn’t in the words themselves—it’s in the patterns. Like a secret handshake baked into the data.
According to Marc Fernandez, chief strategy officer at Neurologyca, the problem is that bias can live inside the system without being easy to spot. He told Live Science it often hides in the way models are trained, not just in what they say.
This method could eventually let someone slip malicious intent into open-source training sets, or influence how models respond to questions about politics, products, or people.
The paper hasn’t been peer-reviewed yet, but the unease is already there. Because when one AI starts teaching another how to think, and nobody knows what it’s saying, the line between “preference for owls” and “human extinction” feels disturbingly thin.
The post AI Is Talking Behind Our Backs About Glue-Eating and Killing Us All appeared first on VICE.