Revenir au blog

Follow and Subscribe

Disponible uniquement en anglais

Cette page n'est actuellement disponible qu'en anglais. Nous nous excusons pour la gêne occasionnée, merci de revenir sur cette page ultérieurement.

Control and Monetize Your Content with the RSL Standard

Simon Wistow

VP Strategic Initiatives, Fastly

An illustration of a keyboard with a lever in the middle and a hand pushing the lever forward

It is impossible to have missed the fundamental shift in the way that content is produced and consumed on the Web in the last few years. AI, no matter your opinions on the subject, feels like an omnipresent topic of conversation with AI maximalists, AGI evangelists, and enthusiasts on one side and sceptics, naysayers, and robot apocalypse doom-mongers on the other.

The truth is probably somewhere in the middle. But one thing that is undeniable is the debate over the role of Scrapers and Crawlers and how the data they gather is used. 

First off, there is the very legitimate issue of copyright infringement. How transformative are Large Language Models? As Pablo Picasso once joked, “good artists copy; great artists steal,” and it's hard to deny that the history of art has been fuelled by reinvention and reinterpretation of inspirations from forebears. Tracing the lineage of any art form shows a constant evolution over generations. And, on a smaller scale,  how many young artists got started copying their favourite pictures or comics? How many bands started off doing covers of their favorite songs? 

And yet this still feels different. Scraping content to mechanically and mathematically remix it into a facsimile doesn't feel like an homage; it feels, at a gut level, like stealing. 

Meanwhile, the same Crawlers are also frequently badly behaved - ignoring well-established standards like robots.txt, going out of their way to hide their IP addresses, hammering sites repeatedly [1, 2, 3]. Worse, the traffic is more expensive - Wikipedia reports that although bots make up 35% of all traffic, they take up 65% of all resources.   

It feels uneven and unfair, and many people, even those who are AI Enthusiasts, think that something needs to be done to redress the balance.

Enter the RSL Standard (Really Simple Licensing) is "an open, XML-based document format for defining machine-readable content licensing and usage terms for digital assets including websites, web pages, books, videos, images, music, and proprietary data."

It's designed to provide a standard, machine-readable format to let publishers, authors, and application developers to easily define licensing and usage terms so that Users and Bots can use digital assets for AI training, web search, and other applications using standardized licensing and royalty agreements as well as a mechanism to let clients automatically license and pay for legal access to digital assets.

How Does It Work?

At its heart, RSL is very simple and needs only two components, one of which is optional. 

The first is a license file, defined in XML. You can then point to the file either in your robots.txt

License: https://your-website.com/license.xml

Or as an HTTP Response Header

Link: https://your-website.com/license.xml; rel="license"; 
type="application/rsl+xml"


Or it can be embedded or linked to in various file formats. 

The license file then gives you complete control over how your content is consumed. It can be as simple as asking for attribution via Creative Commons

<rsl xmlns="https://rslstandard.org/rsl">
  <content url="/">
    <license>
      <payment type="attribution">
        <standard>https://creativecommons.org/licenses/by/4.0/</standard>
      </payment>
    </license>
  </content>
</rsl>


Or you can allow AI bots to train on your content for free 

<rsl xmlns="https://rslstandard.org/rsl">
  <content url="/">
    <license>
      <permits type="usage">ai</permits>
    </license>
  </content>
</rsl>


Or, conversely, prohibit it

<prohibits type="usage">ai</prohibits>

Or if you want to charge for access, this is where the second, optional component comes into play and things get a bit more complicated.

Show Me The Money

Allowing or denying bots to read your content is one thing, but what if you want your content to be crawled, except that you also want to be paid for it. RSL has you covered.

You have three choices - subscription, purchase, or royalty, and flexibility in how those are negotiated. Again, it's all handled from the license file. 

For example, you could require a subscription that comes from a contact form on your site.

<license>
  <permits type="usage">ai</permits>
  <payment type="subscription">
    <custom>https://your-website.com/contact-form.html</custom>
  </payment>
</license>

Or you can  tell the bot how much the content will cost them

<content url="/videos" server="https://example-server.org/api">
  <license>
    <payment type="purchase">
      <amount currency="USD">10</amount>
    </payment>
  </license>
</content>

This requires a license server. These can be hosted anywhere or the RSL Internet Collective provides one. License servers also provide the ability to get a decryption key for someone with a license

<content url="https://example.com/books/example_book.epub.aes" 
         encrypted="true" server="https://example-server.org/api">
  <license>
    <permits type="usage">ai</permits>
    <payment type="royalty">
      <standard>https://rslcollective.org/license</standard>
    </payment>
  </license>
</content>

Can You Integrate It With Fastly Already?

Of course! 

Below you'll find how to implement a really simple version in VCL but it should be equally easy to  do on Compute in JavaScript, Go, and Rust.

First, under the Content menu,  create a new Response Header that gives a Link to your license file.

Not the {" "} around the Source. This is VCL syntax to allow literal quote marks in strings.

Then, still in Content, create a Response (you'll have to click on "Set up advanced response"). On that panel create a condition that checks to see if the URL is license.xml

Then fill in the rest of the response with what license you want. Here I've chosen an Attribution License.

And that's it. Save that and deploy your service.

What's Next

In the future, we'll make a much tighter integration to make it even easier, but in the meantime, we wanted to give you the chance to try it out for yourself.

It should be noted. There are various other standards that are being proposed in various states of completion and openness. In general, we prefer to work with an open standards process rather than proposing similar but proprietary mechanisms, but we will also be providing documentation and integrations with other providers if they prove popular. 

Read more about how compensation fits into the future of content rights in our blog: Why Paying Copyright Holders for AI Training is Essential.