The Atlantic tightens control on AI crawlers with traffic-driven access strategy
The Atlantic has developed an internal rating system to determine which AI crawlers provide enough value to justify access to its content, as reported by Digiday.
The publisher now selectively blocks bots that send little or no referral traffic or subscriber conversions, an approach it hopes will strengthen its negotiating position as AI companies seek licensed datasets. The strategy has already led to the blocking of one crawler that attempted to recrawl the site more than 560,000 times in a week.
Key Points from Digiday’s Coverage
-
The Atlantic’s leadership created a scorecard to assess AI bots based on referral traffic and subscription conversions, using Cloudflare tools to track scraping activity.
-
Nick Thompson said: “Most of the AI platforms drive almost no traffic, and that’s by design… And so the amount of traffic you get is de minimis.”
-
The Atlantic only allows crawlers that generate demonstrable value and blocks those that send zero traffic or negligible subscriber numbers.
-
AI traffic from companies including Google, Apple, Bing, ChatGPT, Amazon, Perplexity and others is monitored weekly.
-
Thompson warned that giving free access “helps [AI engines] potentially out-compete you” and weakens publishers’ leverage in licensing or litigation.
-
Cloudflare and cybersecurity firms report steep growth in AI scraping, with some agents generating billions of monthly requests.
-
Publishers cannot fully opt out of Google’s AI use because Google-Extended and Googlebot are intertwined with Search and AI Overviews.
-
The Atlantic plans to add Cloudflare’s Content Signals Policy to robots.txt to state that Google may index for search but not train AI — though compliance is not guaranteed.
My Analysis
The Atlantic’s approach reflects a notable shift from the early reflex of blanket AI-bot blocking towards a more strategic, value-based framework. By tying access to measurable commercial outcomes—traffic and subscriber acquisition—it reframes the debate around generative AI not as a purely defensive struggle but as a negotiation rooted in quantifiable economics. Crucially, the model mirrors long-standing principles in platform-publisher relations: if a platform wants to use journalism to enhance its product, it must in return deliver audience or revenue.
This scoring system also signals growing publisher sophistication in understanding how AI systems interact with their content. The acknowledgement that scrapers may mask their identity, as well as the need for continuous monitoring, shows how quickly AI-driven traffic has become both technically complex and commercially consequential. Meanwhile, reports of exponential growth in AI scraping suggest this challenge will intensify before it stabilises.
Google remains the immovable object. Because its search and AI systems are interlinked, publishers have little practical ability to prevent their content appearing in AI Overviews without harming their visibility in Search. The forthcoming use of Content Signals is therefore less about immediate control and more about building a record of intent—potentially useful in future litigation or regulatory scrutiny. This is a measured, long-game strategy that aligns with Thompson’s emphasis on clarity and negotiating leverage.
Looking ahead, two paths seem plausible. If more publishers adopt similar rating systems, AI companies may face sustained pressure to negotiate licensing deals, especially as they seek high-quality training data. That could lead to a market where referral value or fees become standardised. Alternatively, AI platforms may choose to innovate around publisher restrictions, developing more sophisticated scraping or relying on synthetic data—reducing publishers’ leverage further. These diverging outcomes hinge on whether the industry can maintain collective pressure and whether regulators step in to define acceptable AI data use.
