Delivering compressed content through Fastly

Much of the data delivered by Fastly to end users is highly compressible, especially text based formats like HTML, JavaScript and CSS. Compressing these types of data can yield huge improvements in performance for end users, and reduce costs.

Fastly automatically optimizes requests to more efficiently cache compressed responses, and supports compressing data at the edge using both GZip and Brotli algorithms.

Optimizing Accept-Encoding

When requests are received by Fastly, the Accept-Encoding header sent by the client tells Fastly whether it is capable of accepting compressed data, and when a response is generated by an origin server or by Fastly, the Vary header tells Fastly whether to store separate variations of the response based on characteristics of the request. In practical terms, that means that if a request carries an Accept-Encoding: gzip, br header, and your origin server returns a response containing a Vary: Accept-Encoding header, Fastly will reuse that response only for future requests that have an Accept-Encoding header with a value of gzip, br.

However, many values of Accept-Encoding are semantically equivalent. If a request carries Accept-Encoding: gzip, br and results in a Brotli-compressed response being saved in our edge cache, a subsequent request that carries Accept-Encoding: gzip, br, deflate should also be able to use that response. Fastly therefore automatically normalizes this header value to reduce the number of permutations, so that if the server delivers a compressed response which we can cache, we can reuse that response for as many users as possible. Learn more.

Compression at origin

If your origin server is capable of delivering compressed responses, and performs content negotiation correctly (respecting the value of Accept-Encoding and correctly adding a Vary header to the response), Fastly will ensure that compressed responses are only delivered to clients that support them, and that we use cached responses as much as possible.

Compression at the edge

Fastly can compress data for you on our edge servers. Static compression is done on responses when received from an origin server (before caching or post-processing), while dynamic compression is done just before responses are delivered to the client.

TypePlatformsWhere it happensHow you enable itOptionsBilling
StaticVCL onlyPre-cache, when responses are received from originAPI, UI or VCL codeFile type, compression typeCompressed size
DynamicVCL & ComputePost-cache, when responses are leaving Fastly bound for the end userHTTP HeaderNoneUncompressed size

Static compression

Fastly can take uncompressed response data from your server and compress it at the edge before inserting the compressed object into our cache. This is called static compression. The resulting cached object can then be used to serve future requests that have a compatible Accept-Encoding without having to perform the compression again.

Static compression is available only in VCL services, and can use the GZip or Brotli algorithms. It can be enabled via the web interface, API, or in VCL code using the beresp.gzip and beresp.brotli variables in the vcl_fetch subroutine.

If you want to perform static compression using your own VCL code, ensure you also add a suitable Vary header to the response, and only compress formats that are not already compressed (media formats like images, audio and video are typically already compressed and will not benefit from GZip or Brotli). The following code example provides an example implementation:

Static compression is not supported on the Compute platform, and is not compatible with Edge Side Includes. If your Fastly service is subject to metered delivery charges, responses compressed statically are billed based on the data size delivered to the client - i.e. the compressed size.

Dynamic compression

Static compression is the most efficient way to compress data at the edge, especially for cacheable responses. However, if static compression isn't suitable for your use case, you can also compress responses individually as they are leaving the Fastly platform, which is called dynamic compression.

Native support for compression is available to both VCL and Compute services, is compatible with Edge Side Includes, and is enabled by adding the X-Compress-Hint header to the outgoing response:

  1. Fastly VCL
  2. Rust
  3. JavaScript
  4. Go
response.set_header("x-compress-hint", "on");

The X-Compress-Hint header enables dynamic compression but whether the response is actually compressed and which algorithm is used depends on the value of the Accept-Encoding header on the associated request, and whether the response is already compressed. With dynamic compression these decisions are automatic and not configurable.

Dynamic compression happens after the size of the response is measured for billing purposes, so if your Fastly service is subject to metered delivery charges, responses compressed dynamically are billed based on the size before the compression takes place.

Decompression at the edge

If your origin server is serving compressed responses, you may want to decompress these responses at the edge, in order to parse, transform or otherwise act on the contents of the response. This is available in Compute services, both as a platform level primitive, and also in-process using features in many supported languages.

In VCL services, it is not possible to decompress compressed origin responses. If a client does not support receiving compressed responses, and therefore does not send an Accept-Encoding header with their request, your origin server must be able to serve an uncompressed response to Fastly.

Auto decompress

Setting the auto decompress flag on requests instructs Fastly to automatically decompress any compressed responses before presenting them to your Compute program. You should only do this if you are planning to operate on the response body, and consider combining it with use of dynamic compression so that the response is recompressed before delivery to the client.

  1. Rust
  2. Go
// ContentEncodings is available via fastly_sys

Time spent decompressing response bodies using this mechanism does not count as compute CPU time.

If you do not intend to parse the response body in your Compute program, it's generally better to leave automatic decompression disabled, so that compressed content from origin can remain compressed as it passes through Fastly.

Decompress in process

Some languages provide straightforward mechanisms for decompressing response streams inside of your Compute program:

  1. JavaScript
  2. Go
async function app(event) {
const backendResponse = await fetch(event.request, { backend: "origin_0" });
let resp = new Response(backendResponse.body.pipeThrough(new DecompressionStream("gzip")), backendResponse);
return resp;
addEventListener("fetch", event => event.respondWith(app(event)));

Time spent decompressing response bodies using this mechanism contributes to the billable CPU time for your Compute program.