It was only a matter of time before AI was trained on the content it had previously generated.
I suppose that’s the inevitable outcome of artificial intelligence—when humans rely so heavily on it to generate content, all that will be left to train it is the content the AI has previously generated.
The Large Language Models, or LLMs, that power ChatGPT, Claude, and Google Gemini are now consuming the very thing that made them smart in the first place: Us. With nowhere else to go, LLMs are now cannibalizing themselves.
Veteran tech journalist Steven Vaughn-Nichols warns of “model collapse,” the data scientist’s term for when AI trained on AI starts glitching out. Or, as he calls it, GIGO, meaning Garbage In/Garbage Out. AI output loses coherence, accuracy drops off a cliff, and the models start spitting out content that ranges from offensively wrong to just plain offensive.
To avoid this, companies like Google, OpenAI, and Anthropic are developing a workaround called retrieval-augmented generation, or RAG, which basically allows AIs to “Google” stuff in real-time instead of relying entirely on pre-trained data. In other words, they are teaching it to be more like us.
What Does This Mean for the Future of AI?
The hope is that it keeps models sharp by feeding them fresh, human-written knowledge. In reality, the internet is now a content landfill overflowing with AI-generated slop. Inaccurate blogspam, factoids that have been chewed up and spit out a thousand times over, and half-baked advice columns written by LLMs that probably learned how to write from other LLMs.
Bloomberg researchers recently tested this setup by pitting 11 state-of-the-art RAG-enhanced models against more traditional ones. The results were not great. The RAG models were far more likely to generate unsafe or unethical responses, ranging from privacy violations to outright misinformation.
That’s especially worrying considering these models are used in everything from customer service bots to health advice chat tools.
As Vaughn-Nichols grimly notes, this might all be a slow-motion car crash. We’ve burned through most of the internet’s original, human-made wisdom in only a handful of years. AI is being forced to cannibalize its outputs.
And unless these companies find a way to incentivize people to keep producing work that will be fed directly into these models, then this whole AI boom will soon come to an end.
The existence of these LLMs is predicated on the idea that we humans use our creativity and life experience to create new ideas that would turn into fun new sentences that the LLMs can then steal from us to learn how to be like us.
But once all of our stuff is gone, stagnation settles in, and reality sets in: for as much as AI executives talk about using AI to replace us, AI models cannot evolve without us.
The post AI Models Are Cannibalizing Each Other—and It Might Destroy Them appeared first on VICE.