Teach Your robots.txt a New Trick (for AI)

Senior Security Strategist, Fastly

Time for a Quick robots.txt Tune-Up! 🤖
Hey folks, when was the last time you reviewed your robots.txt file? Apple and Google are training their new AI on the vast amount of information available on the web, and have given us new ways to control whether our content is used for this purpose.
What's the Buzz About?
Recently, both Google and Apple introduced new user agents that are all about improving their AI products and features.
Google-Extended: This isn't a new crawler, but a special instruction you can add to your robots.txt file. It tells Google not to use your content to train their AI models, including Gemini. Your site's regular search ranking won't be affected.
Applebot-Extended: Similar to Google's new agent, this bot from Apple lets you opt out of your content being used to train Apple's AI, which powers features like Apple Intelligence. Disallowing this will not remove your site from Apple's search results.
Why You Should Update Your robots.txt
The main reason is control. By adding a few lines to your robots.txt file, you can decide whether you want your website's content to be part of the training data for these large language models. If you have content that you consider valuable intellectual property, then this update is for you.
The How-To: A Simple Copy & Paste
It is as easy as adding the following to your robots.txt file:
To block Google's AI training:
User-agent: Google-Extended
Disallow: /
To block Apple's AI training:
User-agent: Applebot-Extended
Disallow: /
You can add these blocks to your existing robots.txt file.
The reason that these modifications are needed in the robots.txt file is because the source for the traffic and the user-agent are the same as the crawlers. Here is the quote from Google’s documentation.
“Google-Extended doesn't have a separate HTTP request user agent string. Crawling is done with existing Google user agent strings; the robots.txt user-agent token is used in a control capacity.”
So, take a few minutes today to review and update your robots.txt file. It's a small change that gives you a bigger say in how your content is used for training AI. If you do not have a robots.txt or want to move that file closer to your end users, then you can easily create a robots.txt file using Fastly.
While Google and Apple have been kind enough to help you control how they access your content, there are dozens of crawlers, fetchers, and other bots likely accessing your content that aren’t so easy to solve for. For more visibility into bots (including AI bots) that may be verified, as well as those that are not easily verified, check out Fastly Bot Management.
Here are the sources if you want to review the information yourself.