Taking Back Control: How Publishers Can Push Back on Unwanted AI Scraping

Principal Industry Marketing Manager, Media & Entertainment, Fastly

Generative AI models are changing how people access and consume information. Large language models (LLMs) are powered by vast amounts of data, which, to a considerable extent, is gathered by web scrapers that automatically extract publicly available content across the internet.
While scraping itself isn’t new, the scale and purpose have dramatically shifted, moving from indexing for search engines to fueling robust generative AI systems. The rise of bot traffic is significant. Tollbit, a platform to help websites ensure fair compensation for their content and data, saw an 87% increase in AI bot traffic in Q1 of this year. This evolution has reignited long-standing legal and ethical debates around content ownership. Publishers, creators, and platforms are questioning whether it’s fair, or even lawful, for their content to be ingested by AI models without permission, credit, or compensation. Among well-known cases, Reddit has sued Anthropic, alleging that Anthropic bots had accessed its site more than 100,000 times.
The unauthorized scraping of online content by AI bots presents a significant challenge to content creators and publishers. As Renn Turiano, Chief Consumer and Product Officer at Gannett Media, states:
“It’s vital to preserve the integrity of our journalism across USA TODAY and our 200+ local publications. AI bots that scrape our work without permission or compensation undermine that integrity—and raise urgent questions about fairness, legality, sustainability, and the future of independent media. We’re encouraged by the work Fastly and Tollbit are doing to help defend our intellectual property and protect the value of original reporting.”
The training of AI models through scraped content brings an additional challenging dimension to this issue. LLMs require consuming massive amounts of online information to improve themselves. This includes blog content, tutorials, research papers, and user-generated content, which they use to develop their language abilities and domain expertise. Some of this content is sourced under open licenses. Much of it isn’t.
Too Late to Act: When Scraping Goes Undetected
Content producers face a double-sided problem when their work gets scraped, as the issue extends beyond theft. The problem is not that scraping occurs, but rather that content creators usually discover it only after the event has already passed, as few have the technology in place to detect and block scrapers.
Content owners often find themselves left to detect scraping activities themselves, which may include noticing unexplained drops in traffic, duplicated phrases on competitor websites, and potentially a lower search engine ranking due to their content being republished on a different site.
They search for solutions that offer clear visibility into when and how their content is being accessed. But beyond detection, many are also exploring strategic responses, whether it’s setting bot policies, gating premium content, or negotiating licensing frameworks.
Navigating Legal Gray Areas
Scraping public content does not necessarily amount to theft. The legal framework protects scraping activities when scrapers avoid breaching service terms and their transformed output retains a distinct character. The situation certainly evokes a sense of exploitation, although it may not always violate legal boundaries. The ability to stop content scraping remains restricted when there is no use of a login or a payment barrier.
Some fight back, such as the educational technology company Chegg. The legal battle between Chegg and Google demonstrates the intensifying conflict between these two entities. Chegg claims Google uses AI Overviews to extract its educational content for generating answers, which then appear in search results, thus diminishing the need for students to visit the original source. The situation illustrates how AI-generated summaries eliminate the necessity for students to access the original website that produced the content.
Using Fastly AI Bot Management to Combat Scraping
For organizations concerned about content ownership, unauthorized data harvesting, and infrastructure strain, managing this new class of traffic is already a pressing issue. Fastly’s AI Bot Management addresses this challenge by enabling customers to detect and control the behavior of AI-driven bots that scrape content from their websites.
Built on the foundation of Fastly’s existing Bot Management capabilities, this feature helps organizations identify AI bots that access content and respond according to their own policies, whether that involves blocking traffic, allowing specific bots, or intercepting requests for review. It’s a flexible approach that empowers publishers, developers, and platform operators to strike a balance between openness and control.
The feature is available at no cost to qualifying open source projects and nonprofit organizations through Fastly’s Fast Forward program, which currently supports over a million requests per second across the projects it serves.
Fast but not Exposed: Defending Cached Content from Scrapers
Caching is essential for delivering fast, responsive digital experiences. It reduces load times, eases pressure on origin servers, and helps content scale smoothly during traffic spikes. But the very accessibility that makes caching effective can also make it a target. Without proper safeguards, cached content becomes an easy mark for scrapers and bots that harvest data at scale, often undetected and without permission.
Defending cached content is just as critical as securing your origin infrastructure. With Fastly Bot Management and a simple VCL update, you can inspect cache hits, apply intelligent challenges, and validate bot traffic in real time, without sacrificing speed or user experience. This proactive approach protects your SEO, preserves revenue, and keeps your digital content in the right hands.
Beyond Blocking: Monetization Opportunities
With more control over access and AI Bot Management comes an opportunity to turn this growing class of traffic into a new monetization opportunity.
Fastly has partnered with TollBit to integrate Advanced Bot Management with TollBit’s Bot Paywall and pay-per-access monetization solution. With this integration, rather than simply blocking, AI bots can be presented with an opportunity to pay for legitimate content access in a scalable and sustainable way. This creates an opportunity to transform what was once purely a cost into a revenue stream.