Cache freshness and TTLs

The most common use of the Fastly edge cache is to store HTTP resources, such as webpages, JavaScript, CSS, images and video. It's therefore important that Fastly is able to understand, interpret and act upon the instructions encoded into HTTP responses, which tell us how to cache those objects and for how long.

HINT: We offer several caching interfaces. HTTP cache semantics are currently only supported in the readthrough cache, which is the only cache interface available in VCL services and the most common cache interface used in Compute services.

This page describes how we determine how long to cache HTTP resources for and, therefore, how you can effectively control Fastly's caching behavior.

Response processing

The most common (and best practice) means of controlling cache lifetime is by setting an appropriate Cache-Control header on an origin response. When a response is received from a backend server, the cache will parse relevant response headers in an attempt to determine whether it can be cached, and for how long. In VCL services you can modify these decisions in the vcl_fetch subroutine.

Parsing cache semantics

HTTP responses are parsed for the following cache semantics:

PropertyParsing logicDefault
Is response cacheable?If the fetch is a result of an earlier explicit pass on the request, then no; otherwise
if the fetch is a result of a hit-for-pass, then no; otherwise
if HTTP status is 200, 203, 300, 301, 302, 404, or 410, then yes;
otherwise no
Cache TTLResponse headers in order of preference:
Surrogate-Control: max-age={n}, otherwise
Cache-Control: s-maxage={n}, otherwise
Cache-Control: max-age={n}, otherwise
Expires: {date}
2 min
Stale-while-revalidate TTLResponse headers in order of preference:
Surrogate-Control: stale-while-revalidate={n}, otherwise
Cache-Control: stale-while-revalidate={n}
Stale-if-error TTLResponse headers in order of preference:
Surrogate-Control: stale-if-error={n}, otherwise
Cache-Control: stale-if-error={n}

For example, an HTTP 200 (OK) response with no cache-freshness indicators in the response headers is cacheable and will have a TTL of 2 minutes. A 500 Internal Server Error response with Cache-Control: max-age=300 is not cacheable, because of its HTTP status code, and therefore the 5 minute TTL (300 seconds) indicated in the Cache-Control header is irrelevant.

In VCL services the TTLs resulting from parsing the response headers are available as VCL variables in vcl_fetch: beresp.ttl, beresp.stale_while_revalidate and beresp.stale_if_error. In Compute services the parsed cache TTL is not available to read.


The HTTP Age header allows the backend to indicate that an object has already spent some time in a cache upstream before being served to Fastly. If the response includes an Age header with a positive value, that value will be subtracted from the response's max-age, if it has one. If the resulting TTL is negative, it is considered to be zero. If the TTL of a response is derived from an Expires header, any Age header also present on the response will not affect the TTL calculation.

Age does not affect the initial values of stale-while-revalidate or stale-if-error TTLs. If a response includes a Cache-Control: max-age=60, stale-while-revalidate=300 and also Age: 90, then the object's TTL will be set to 0 (because Age is higher than 60) but the separate stale-while-revalidate TTL will still be 300 seconds.

In VCL services, it's possible to remove or change the Age header in vcl_fetch, but it will have already affected the calculated TTL of the resource by that point.

The Age header is also set by Fastly, to the amount of time that the object has spent in the Fastly cache, plus the existing value of the Age header on the cached object. This mechanism is used to ensure that objects cached at multiple tiers of Fastly as a result of shielding will not accrue more cache freshness than was originally intended. In VCL services, the header is updated just before the response is delivered to the client. In Compute services, the updated header is in place when the response is returned from the fetch function.

Surrogate control

The Surrogate-Control: max-age and Cache-Control: s-maxage header directives express a desired TTL for server-based caches (such as Fastly's readthrough cache). We will therefore prefer these over Cache-Control: max-age when calculating the initial value of beresp.ttl.

Additionally, Fastly will remove any Surrogate-Control header before a response is sent to an end user. We do not, however, remove the s-maxage directive from any Cache-Control header.

IMPORTANT: If your service uses shielding, then the 'end user' making the request to the Fastly edge may be another Fastly POP. In this situation we do not strip the Surrogate-Control header, so that both POPs will parse and respect the Surrogate-Control instructions.

Overriding semantics

IMPORTANT: In Compute services there is currently no way to override the cache TTLs determined when the readthrough cache parses HTTP cache semantics, except by switching to the lower-level core cache interface.

In VCL services, once the response has been parsed, the vcl_fetch subroutine is executed (unless the request is a revalidation). The headers received with the response are populated into beresp.http.{NAME} VCL variables and the freshness information is populated into the following variables:

Within the vcl_fetch subroutine, you can affect the caching behavior in a number of ways:

  • Modifying Fastly cache TTL
    To change the amount of time for which Fastly will cache an object, override the value of beresp.ttl, beresp.stale_while_revalidate, and beresp.stale_if_error:

    set beresp.ttl = 300s;

    HINT: This will override entirely the TTL that Fastly has determined by parsing the response's freshness semantics. If your service uses shielding, you may want to subtract Age manually. See the beresp.ttl docs for more information.

  • Modifying downstream (browser) cache TTL
    To change the way that downstream caches (including browsers) treat the resource, override the value of the caching headers attached to the object. Take care if you use shielding since you may also be changing the caching policy of a downstream Fastly cache:

    if (req.backend.is_origin) {
    set beresp.http.Cache-Control = "max-age=86400"; # Rules for browsers
    set beresp.http.Surrogate-Control = "max-age=31536000"; # Rules for downstream Fastly caches
    unset beresp.http.Expires;

The standard VCL boilerplate (which is also included in any Fastly VCL service that does not use custom VCL) applies some logic that affects freshness:

  • If the response has a Cache-Control: private header, execute a return(pass).
  • If the response has a Set-Cookie header, execute a return(pass).
  • If the response does not have any of Cache-Control: max-age, Cache-Control: s-maxage or Surrogate-Control: max-age headers, set beresp.ttl to the fallback TTL configured for your Fastly service.

WARNING: If you are using custom VCL, the fallback TTL configured via the web interface or API will not be applied, and the fallback TTL will be as hard-coded into your VCL boilerplate (you're free to remove any of the default interventions, including the fallback TTL logic, if you wish)

Cache outcome

After parsing the response for freshness semantics (and in the case of VCL services, executing the vcl_fetch subroutine), the readthrough cache will save the object, or not, based on the following criteria, in this order of priority:

Deliver stale
VCL only
return(deliver_stale) is executed in vcl_fetch (see more about stale content for details).An existing, stale object is served from the cache. The downloaded response is discarded, regardless of its cacheability or proposed TTL. No changes are made to the cache.
Deliver uncachedThe above does not apply, and the content is deemed uncacheable (based on HTTP status code or beresp.cacheable in VCL) or has a total TTL1 of zeroThe new response is served to the end user, and no record is made in the cache. Requests queued up due to request collapsing are dequeued and forwarded individually to origin.
Cache and pass
VCL only
The above rules do not apply, and return(pass) is executed in vcl_fetch of a VCL service.The new response is served to the end user, and an empty hit-for-pass object is saved into the cache for the duration specified by its TTL, but subject to a minimum of 120 seconds and a maximum of 3690. This object exists to allow subsequent requests to proceed directly to a backend fetch without being queued by request collapsing.
Cache and deliverAll other cases (in VCL services this means return(deliver) either explicitly or implicitly).The new response is served to the end user, used to satisfy queued requests, and stored in cache for up to the duration specified by its TTL.

IMPORTANT: We won't necessarily store objects for the full TTL requested, and may evict less popular objects earlier, especially if they are large. We also do not automatically evict objects when they reach their TTL. They simply become stale.

If you are experiencing a slow request rate or timeouts on uncacheable resources, it may be because they are forming queues that can be solved by creating a hit-for-pass. For more details, see request collapsing.


If the backend fetch is triggered by a cache object being stale, and the object has a validator (an ETag or Last-Modified header), the readthrough cache will make a conditional GET request for the resource, by sending an If-None-Match and/or If-Modified-Since header as appropriate (if both validators are present, both headers are sent).

If the backend returns an HTTP 304 (Not Modified) response, the cache will process the response headers based on the rules set out above, to determine a new TTL for the existing object, and will reset the object's Age. However, in VCL services, when an existing cache object is successfully revalidated in this way, vcl_fetch will not run, and therefore the 304 response's HTTP headers alone will be used to determine the TTL. In Compute services there is no opportunity to run code before an object is inserted into the readthrough cache, and revalidations make no difference to that behavior.

NOTE: If the initial object's TTL was determined by an Expires header and no freshness-related headers are present on a 304 response, the cache will set a TTL of 2 minutes (a default TTL) for the existing object. This is because the Expires header value identifies a fixed point in time while other freshness header values are given as times relative to now.

Any response to a revalidation request other than 304 will be processed normally, will trigger vcl_fetch (in VCL services), and will (if cacheable) replace the stale object in cache.

HINT: Revalidations triggered as a result of a stale-while-revalidate directive happen in the background, after the stale object has already been delivered to the end user. They can be identified by the req.is_background_fetch variable in VCL, and if successful, they do not reset the Age of the object. In all other respects, these asynchronous revalidations are the same as a regular revalidation.

Pre-defining cache behavior on requests

Sometimes you may know what cache behavior you'd like for a response, before forwarding a request to origin. In Compute services, all SDKs offer a method for pre-defining the TTL of a response before the request is forwarded via the readthrough cache, and for designating a request as a PASS. VCL services can flag a request as a PASS, but cannot pre-define an exact TTL.

  1. Fastly VCL
  2. Rust
  3. JavaScript
  4. Go

The example above flags a request as a PASS, before forwarding it to origin via the readthrough cache. In Compute SDKs, it's also possible to set a specific TTL (for example, see the CacheOverride interface in JavaScript and the set_ttl method in Rust).

Where requests are flagged as PASS or have an override TTL of 0 the response will never be cached. However, unlike a zero TTL, a request flagged as PASS will disable request collapsing, allowing multiple requests for the same URL to be forwarded to origin concurrently.

Preventing content from being cached

Since Fastly respects HTTP caching semantics in the readthrough cache, the best way to avoid content from being cached is to set the Cache-Control header on responses at your backend server. Sending the following header attached to a response will ensure that, when it is received by Fastly, we won't cache it and neither will any other downstream cache, such as a browser:

Cache-Control: private, no-store

Sometimes, you may not have access to change the headers emitted by your backend or you may want more precise control over the circumstances in which the content should not be cached.

Cache at the edge, not in browsers

You may want the content to be cached by Fastly but not by browsers. You can do this purely in the initial HTTP response header from your origin server:

Cache-Control: s-maxage=3600, max-age=0

or in VCL services you might prefer to apply an override in vcl_fetch:

set beresp.http.Cache-Control = "private, no-store"; # Don't cache in the browser
set beresp.ttl = 3600s; # Cache in Fastly
set beresp.ttl -= std.atoi(beresp.http.Age);

Cache in browser, not at the edge

Fastly will not cache private content, making it a good way to apply this kind of differentiated caching policy via a single header attached to the response from your origin server:

Cache-Control: private, max-age=3600

In VCL services you can also apply the same logic in vcl_fetch:

set beresp.http.Cache-Control = "max-age=3600"; # Cache in the browser
return(pass); # Don't cache in Fastly

Best practices

Here are some general best practices to apply when caching resources with Fastly's readthrough cache:

  • Set long TTLs at the edge
    It's easy to purge a Fastly service, whether for a single URL, a group of tagged resources, or an entire service cache, and it takes only a few seconds at most. To increase your cache hit ratio and the responsiveness of your site for end users, consider setting a long cache lifetime when saving things into the Fastly cache. When content changes, send a purge request to clear the old content.

  • Serve stale
    Serving a slightly stale response may be preferable to paying the cost of a trip to a backend, and it's almost certainly better than serving an error page to the user. Consider using the stale-while-revalidate and stale-if-error caching directives in your Cache-Control headers, or consider setting the beresp.stale_while_revalidate and beresp.stale_if_error variables in VCL services. Learn more about staleness and revalidation.

  • Reduce origin first byte timeout
    When making a request to a backend server, Fastly waits for a configurable interval before deciding that the backend request has failed. This is the first byte timeout and by default is fairly conservative. If you expect your backend server to be more responsive, you can choose to 'fail faster' by decreasing this value, in conjunction with serving stale.

  • Don't allow the fallback TTL to apply (VCL services only)
    Fallback TTLs are a primitive solution, and very unlikely to be an ideal TTL for any specific resource. Try to configure an appropriate Cache-Control header on all responses you send from your backend servers, or if that isn't possible, include logic in your VCL to address those responses more explicitly.