A new, ‘diabolical’ way to thwart Big Tech’s data-sucking AI bots: Feed them gibberish

AI Robot hand squeezing data, binary code and dollar signs out of a desktop computer — AI tools vacuum up massive amounts of free training data from the internet, companies are fighting back.

Getty Images/Alyssa Powell

A data point caught my eye recently. Bots generate more internet traffic to websites than humans now, according to cybersecurity company Thales.

This is being driven by a swarm of web crawlers unleashed by Big Tech companies and AI labs, including Google, OpenAI, and Anthropic, that slurp up copyrighted content for free.

I’ve warned about these automated scrapers before. They’re increasingly sophisticated and persistent in their quest to harvest information to feed the insatiable demand for AI training datasets. Not only do these bots take data without permission or payment, but they’re also causing traffic surges in some parts of the internet, increasing costs for website owners and content creators.

Thankfully, there’s a new way to thwart this bot swarm. If you’re struggling to block them entirely, you can send them down new digital rabbit holes where they ingest content garbage. One software developer recently called this “diabolical” — in a good way.

Absolutely diabolical Cloudflare feature. love to see it pic.twitter.com/k8WX2PdANN

— hibakod (@hibakod) April 25, 2025

It’s called AI Labyrinth, and it’s a tool from Cloudflare. Described as a “new mitigation approach,” AI Labyrinth uses generative AI not to inform, but to mislead. When Cloudflare detects unauthorized activity, typically from bots ignoring “no crawl” directives, it deploys a trap: a maze of convincingly real but irrelevant AI-generated content designed to waste bots’ time and chew through AI companies’ computing power.

Cloudflare pledged in a recent announcement that this is only the first iteration of using generative AI to thwart bots.

Digital gibberish

Unlike traditional honeypots, AI Labyrinth creates entire networks of linked pages invisible to humans but highly attractive to bots. These decoy pages don’t affect search engine optimization and aren’t indexed by search engines. They are specifically tailored to bots, which get ensnared in a meaningless loop of digital gibberish.

When bots follow the maze deeper, they inadvertently reveal their behavior, allowing Cloudflare to fingerprint and catalog them. These data points feed directly into Cloudflare’s evolving machine learning models, strengthening future detection for customers.

Will Allen, VP of Product at Cloudflare, told me that more than 800,000 domains have fired up the company’s general AI Bot blocking tool. AI Labyrinth is the next weapon to wield when sneaky AI companies get around blockers.

Cloudflare hasn’t released data on how many customers use AI Labyrinth, which suggests it’s too early for major adoption. “It’s still very new, so we haven’t released that particular data point yet,” Allen said.

I asked him why AI bots are still so active if most of the internet’s data has already been scraped for model training.

“New content,” Allen replied. “If I search for ‘what are the best restaurants in San Francisco,’ showing high-quality content from the past week is much better than information from a year or two prior that might be out of date.”

Turning AI against itself

Bots are not just scraping old blog posts, they’re hungry for the freshest data to keep AI outputs relevant.

Cloudflare’s strategy flips this demand on its head. Instead of serving up valuable new content to unauthorized scrapers, it offers them an endless buffet of synthetic articles, each more irrelevant than the last.

As AI scrapers become more common, innovative defenses like AI Labyrinth are becoming essential. By turning AI against itself, Cloudflare has introduced a clever layer of defense that doesn’t just block bad actors but exhausts them.

For web admins, enabling AI Labyrinth is as easy as toggling a switch in the Cloudflare dashboard. It’s a small step that could make a big difference in protecting original content from unauthorized exploitation in the age of AI.

The post A new, ‘diabolical’ way to thwart Big Tech’s data-sucking AI bots: Feed them gibberish appeared first on Business Insider.

A new, ‘diabolical’ way to thwart Big Tech’s data-sucking AI bots: Feed them gibberish

Saints trade DT Saunders to Jaguars for C Fortner following preseason game, AP sources say

More than 40 missing after boat capsizes in Nigeria’s Sokoto

Box Office: ‘Weapons’ Scares Off ‘Nobody 2’ With $25 Million, Sydney Sweeney’s ‘Americana’ Bombs

Car of Sacramento mother and baby who were missing for weeks is found underwater

Trump isn’t the peacemaker he thinks he is

New Yorkers fighting against massive battery storage plants find new ally in EPA chief Lee Zeldin

Oregon resident returns grandmother’s 1943 library book after 82 years with note about late fees

George Russell Reveals Key Advice to Prevent Psychological Drain in F1