Caching content with Fastly

The Fastly edge cache is an enormous pool of storage across our network which allows you to satisfy end user requests with exceptional performance and reduce the need for requests to your origin servers. Most use cases make use of the readthrough cache interface, which works automatically with the HTTP requests that transit your Fastly service to save responses in cache so they can be reused. The first time a cacheable resource is requested at a particular POP, the resource will be requested from your backend server and stored in cache automatically. Subsequent requests for that resource can then be satisfied from cache without having to be forwarded to your servers.

Other cache interfaces, such as simple and core, offer direct access to the shared cache layer from your own code and are exclusively available in Compute services.

Use cases for caching vary hugely, and while some might be very simple, others benefit from some clever features built in to the cache mechanism:

  • HTTP semantics for caching, such as the Cache-Control header, are part of the HTTP specification and allow HTTP responses to include information that tells us about how they should be cached.
  • Request collapsing allows us to identify multiple simultaneous requests for the same resource, and make just one backend fetch for it, using the resulting response to populate the cache and satisfy all waiting clients.
  • Streaming miss writes a response stream to cache and to an end user at the same time.
  • Revalidation provides a mechanism for origin servers to validate that content in cache is still good to use, without having to resend it.
  • Purging allows cache entries to be expunged ahead of their normal expiry, so that changes to the source content can be reflected at the edge immediately.

Balancing the benefits of these caching features with the desire for simplicity is the reason we offer multiple different cache interfaces (although they all result in objects being stored in the same cache layer).

IMPORTANT: All data stored in the Fastly cache is ephemeral: it will expire, and may be evicted by Fastly before it expires depending on how frequently it is used. If you require persistent storage at the edge consider using dynamic configuration or data stores instead.

Interfaces

The Fastly cache can be accessed using a Readthrough interface built into the fetch mechanism, or via explicit calls to the simple or core cache interfaces:

ReadthroughSimpleCore
OverviewAutomatically caches HTTP responses based on freshness semantics when you make a fetch from Fastly to a backend.Offers a straightforward getOrSet method for programmatic access to the cache for simple use cases.Offers programmatic access to the cache with full control over all cache metadata and semantics, intended for advanced use cases or building custom higher level abstractions.
Use it for...Automatic cachingSimple key-value cachingComplex requirements
Cache freshnessHTTP semanticsExplicitExplicit
Request collapsingHeuristicAlways-onManual control
Streaming miss✅ (automatic)✅ (manual)
Revalidation✅ (automatic)✅ (manual)
Surrogate keys
Purging

These interfaces all access the same underlying storage layer, and use the same address space. See interoperability for details.

Readthrough cache

The readthrough cache is the Fastly cache interface your services are most likely to use. In both VCL and Compute services, the readthrough interface is enabled by default and invoked every time you make a request to origin from your edge application. It is the only cache interface available to VCL services and works without any configuration or code required. In Compute services you must pass a request to a backend explicitly, and doing so will invoke the readthrough cache:

  1. Rust
  2. JavaScript
  3. Go
#[fastly::main]
fn main(mut req: Request) -> Result<Response, Error> {
Ok(req.send("my_backend_name")?)
}

The readthrough cache understands HTTP cache semantics and seamlessly supports request collapsing (based on cacheability of the object), streaming miss, revalidation and purging. It is a good starting point for most caching use cases. There is no explicit read/write method but it's possible to control the caching behavior in a number of ways:

  • Setting cache policy on a request
    You can mark requests to ensure they are not served from cache. In VCL, return(pass) from vcl_recv or vcl_miss. In a Compute program, use the CacheOverride interface in your preferred language SDK. In Compute services the cache override interface also allows you to set an explicit TTL too, whereas in VCL services you need to do this when the response is received.

  • Setting cache policy on a response (VCL only)
    In VCL services, set beresp.ttl in vcl_fetch to adjust the cache lifetime of a response, or use the beresp.http.Surrogate-Key header to add surrogate keys to the response. In Compute services you cannot configure cache policy during the response phase using the readthrough interface. Consider switching to the core interface if you have this requirement.

  • Knowing if a response is coming from cache or network
    In VCL services, the vcl_hit or vcl_miss subroutines are invoked based on the cache result, and the cache state is also reported in fastly_info.state. In compute services, cache state is reported in the X-Cache HTTP response header.

For more information on how the readthrough cache determines cache lifetime, see cache freshness.

Simple cache

Often you may want to cache data directly from your Compute application in a simple, volatile key-value store, and do not require any of the more complex mechanisms supported by the readthrough cache. For example, if you want to cache the state required to resume an authentication flow, or flags that have been set for A/B testing in a session, a straightforward get/set interface is ideal.

Simple cache operations have always-on request collapsing, so if two operations attempt to populate the same cache key at the same time, the setter callback will only be executed once. However, values are treated as opaque data with no headers or metadata. This means simple cache does not support staleness, revalidation, or variation.

Simple cache is supported in JavaScript via fastly:cache, in Rust via the fastly:cache:simple module, and in Go via the cache/simple package.

  1. Rust
  2. JavaScript
  3. Go
use {
fastly::{
cache::simple::{get_or_set_with, CacheEntry},
mime, Body, Error, Request, Response,
},
std::{thread, time::Duration},
};
#[fastly::main]
fn main(req: Request) -> Result<Response, Error> {
let path = req.get_path().to_owned();
let value = get_or_set_with(path.clone(), || {
Ok(CacheEntry {
value: expensive_render_operation(&path),
ttl: Duration::from_secs(60),
})
})
.unwrap()
.expect("closure always returns `Ok`, so we have a value");
Ok(Response::from_body(value).with_content_type(mime::TEXT_PLAIN_UTF_8))
}
fn expensive_render_operation(path: &str) -> Body {
// expensive/slow function which constructs and returns the contents for a given path
thread::sleep(Duration::from_secs(1));
return path.into();
}

Core cache

This low-level interface available in Compute services offers the primitive operations required to implement high-performance cache applications with all the same advanced features available from the readthrough cache, but gives you complete control of them. It is currently supported in Rust by the fastly::cache::core module and in Go by the core package.

Items cached via this interface consist of:

  • A cache key: up to 4KiB of arbitrary bytes that identify a cached item. The cache key may not uniquely identify an item; headers can be used to augment the key when multiple items are associated with the same key. See LookupBuilder::header() in the Rust SDK documentation for more details.
  • General metadata, such as expiry data (item age, when to expire, and surrogate keys for purging).
  • User-controlled metadata: arbitrary bytes stored alongside the cached item contents that can be updated when revalidating the cached item.
  • The object itself: arbitrary bytes read via Body and written via StreamingBody.

In the simplest cases, the top-level insert (Rust, Go) and lookup (Rust, Go) functions are used for one-off operations on a cached item, and are appropriate when request collapsing and revalidation capabilities are not required.

The core cache also supports more complex uses via the concept of a "transaction", which can collapse concurrent lookups to the same item, including coordinating revalidation. The following example demonstrates a lookup/insert cache transaction:

  1. Rust
  2. Go
const TTL: Duration = Duration::from_secs(3600);
// perform the lookup
let lookup_tx = Transaction::lookup(CacheKey::from_static(b"my_key"))
.execute()
.unwrap();
if let Some(found) = lookup_tx.found() {
// a cached item was found; we use it now even though it might be stale,
// and we'll revalidate it below
use_found_item(&found);
}
// now we need to handle the "must insert" and "must update" cases
if lookup_tx.must_insert() {
// a cached item was not found, and we've been chosen to insert it
let contents = build_contents();
let (mut writer, found) = lookup_tx
.insert(TTL)
.surrogate_keys(["my_key"])
.known_length(contents.len() as u64)
// stream back the object so we can use it after inserting
.execute_and_stream_back()
.unwrap();
writer.write_all(contents).unwrap();
writer.finish().unwrap();
// now we can use the item we just inserted
use_found_item(&found);
} else if lookup_tx.must_insert_or_update() {
// a cached item was found and used above, and now we need to perform
// revalidation
let revalidation_contents = build_contents();
if let Some(stale_found) = lookup_tx.found() {
if should_replace(&stale_found, &revalidation_contents) {
// use `insert` to replace the previous object
let mut writer = lookup_tx
.insert(TTL)
.surrogate_keys(["my_key"])
.known_length(revalidation_contents.len() as u64)
.execute()
.unwrap();
writer.write_all(revalidation_contents).unwrap();
writer.finish().unwrap();
} else {
// otherwise update the stale object's metadata
lookup_tx
.update(TTL)
.surrogate_keys(["my_key"])
.execute()
.unwrap();
}
}
}

For complete documentation on the core cache interface, refer to the reference for the Compute SDK of your choice.

Interoperability

Whether you use the readthrough cache, simple cache or core cache interfaces, data is stored in the same namespace, but interoperability is currently limited to the following:

  • The core cache interface can read and overwrite objects inserted via the simple cache interface.
  • Simple cache can read (but cannot overwrite existing) objects inserted using core cache but provides only the body of the object.
  • The readthrough cache interface is not interoperable with other cache interfaces and cannot read data written through another interface, nor can it write data that is visible to other cache interfaces.

Interoperability also affects purging.

Limitations and constraints

The following limitations apply to all cache features:

  • Maximum number of cache operations per request: 20.
  • Maximum data volume per cache write: 1GB.
  • Max object size: 100MB.
  • Variants (created explicitly in the core cache interface or when the readthrough cache processes the Vary HTTP response header) are limited differently depending on platform:
    • In Compute services, the number of variants is not limited but the number of distinct vary rules is limited to 8 per cache object.
    • In VCL services, the number of variants is limited to 50 per cache object, regardless of the number of Vary rule permutations.
  • In the core cache interface, write operations target the primary storage node for that cache address only. If an existing object is overwritten, replicated copies may continue to be returned by subsequent reads until their TTL expires or the object is purged.
  • Purges are asynchronous while writes are synchronous, so performing a purge immediately before a write may result in a race condition in which the purge may clear the primary instance of the cached data after the write has completed.

Use of Fastly services is also subject to general limitations and contraints which are platform specific: for more information see separate limits for Fastly Compute and for VCL services.