Call usTry Fastly free

Choose your adventure: 3 ways to pre-fetch video manifest segments using Fastly

Everyone has experienced slowly loading and/or buffering of videos. This is especially common with niche, long tail content that lacks popularity and therefore isn't accessed often enough to be cached.

Caching on CDNs works well for blockbuster hits that everyone wants to watch at once, but what about niche content that isn’t accessed regularly enough to be stored in cache, or whose origin is far away? Unlike other CDNs, the ability to pre-fetch or pre-warm the cache is a built-in capability on our delivery network. And best of all, it requires just a few self-serve changes to your service. 

In this post, I’ll walk you through three ways to do this — by using Link headers, by prefetching entirely on cache, and by using Compute@Edge, our next-generation platform that executes WebAssembly code at the network edge. In each approach, Fastly fetches the video segments before you need them, but we will trigger those fetches in different ways.

One video, lots of tiny pieces 

In modern, video-on-demand streaming, you start by making a super high-quality rendition of your video, known as the mezzanine file. Then, you make multiple (usually six or so) versions of the content at lower qualities, which results in files of varying sizes that will fit different bandwidths of internet access. You want each user to be able to watch the highest quality they can download smoothly on their internet connection, and that will be different for each person and location. Finally, you break those videos into small segment files, each just a few seconds in length, and make a manifest that tells the viewer’s player software which segments exist and in what order to play them. Those last two steps — along with features to allow for things like subtitles, multiple audio languages, and ad insertion — are part of protocols like HLS

When a viewer presses play, their player software will first download the manifest and then will start downloading the first segment. While they're watching that segment, the player starts downloading the next segment, and so on until they've watched the whole video. Buffering happens when the player has finished playing the current segment but is still fetching the next one. It’s a poor user experience and, as we said, can be worse for longtail content that isn’t cached on the CDN edge.

The trick we need to pull is simply to get a segment into cache at the edge before the user gets to the end of the previous segment. While they’re still watching the first 5 seconds of the video, we need to be busy fetching content and getting ready for the next 5-second segment. Let's dig into a few ways to do that.

How to prefetch using Link headers

Using HTTP Link headers in segment responses is similar to shining a flashlight ahead of you — it enables you to see a bit more of the path ahead than you might otherwise. The header instructs the player client to issue pre-emptive fetches for whatever the Link header is pointing to. Essentially, Fastly is adding code that prompts a client to warm the cache by making additional requests. The following VCL snippet demonstrates how you add that header on the edge and ask clients to fetch more segments than requested:

sub vcl_deliver {
declare local var.ts_ext STRING;
declare local var.segment_num INTEGER;
if (req.backend.is_shield && req.url.basename ~ "([a-zA-Z]+)([^a-zA-Z.]+).ts") {
  set var.segment_num =  std.atoi(re.group.2);
  set var.segment_num += 1;
  log "This is the original object: "req.url;
  log "This is the pushed object: "req.url.dirname "/" re.group.1 std.itoa(var.segment_num) ".ts";
  add resp.http.Link = "<" req.url.dirname "/" re.group.1 std.itoa(var.segment_num) ".ts" ">; rel=preload;";
}
}

Try it yourself with this VCL fiddle that will implement Link header prefetch and throttle it to a max of six segments, countdown to zero with every subsequent prefetch, and reset the count to six segments again.

We are sending exactly one Link header back to the client per request in our approach. This causes a cascading effect as the prefetched segment receives another Link header, which is, in turn, prefetched by the browser and so on to prefetch a limited number of segments based on a segment count we store in a cookie (a prefetch throttle). We stop prefetching when the count reaches zero. When the client (browser or any app) makes subsequent requests to Fastly, it presents the cookie with a segment count that we decrement and send back. This limits the number of segments that you will prefetch on the client, and lets you pick and choose how many segments to prefetch.

The fiddle client doesn’t handle response headers so it won’t do anything when it sees the Link header instructing it to prefetch it from the browser, but if you deploy the code in a Fastly VCL service and point it to your origin, you should see the browser show X-Cache: MISS, HIT when the player reaches the desired segment. The key here is that the client initiates prefetch, as illustrated below, where the player requests index1.ts but client is prefetching index2.ts and index3.ts segments as well.

Diagram 1

So when the player comes around to requesting the second segment, it already finds it in cache and saves a round trip to origin thereby reducing latency.

Diagram 2

By setting rel=preload; as=image, we are effectively tricking the browser into thinking we are prefetching images because the Link header prefetch only works on a limited number of content types (Note: video preloading is included in the Preload spec, but is not currently implemented by browsers). Because we use as=image, the browser never downloads the entire image. In-built verification method inspects object headers for image type, and the browser finds mangled headers since we are asking the browser to assume a video object is an image. Thereafter, it simply abandons the download once the Link header prefetch begins after a few kilobytes. The object is still cached on the edge since Fastly imposes no such requirement. So since the image download never completes, it doesn’t impact your bandwidth. A nice “feature” to exploit.

Here’s a working example you can see in action, and below is what it looks like in your network console.

network console example

Notice that the response headers with MISS, MISS are ones being prefetched by the browser since we sent those segments in the Link header, even though the player hasn’t requested them yet. Subsequently, when the player (hls.js) does request them, it finds those segments either already cached in the browser or at the Fastly edge.

We also set a cookie value of segment_count to six, which helps throttle prefetches to a maximum of six. The larger this value is, the more the client will prefetch.

segment count 6

You control how much is prefetched using this segment count value and prevent any further prefetches by reducing the segment count on the edge.

Segment Count 5

In standard use, rel=preload; will cause our edge POPs to also push the objects to the client. To avoid server push, add the nopush directive. Or, if you prefer to only push and not send the Link header to the client you can use the directive x-http2-push-only.

You can implement this in VCL, but not all clients support cookies or HTTP/2 prefetch. In order to avoid a cookie-based approach and keep it all in VCL, refer to this example.

How to prefetch using restarts at the edge 

By using VCL restarts, we avoid loading the object on the edge and client unlike previously with Link headers. The VCL feature known as restarts lets us prefetch segments at the edge. Please note that when the HLS manifest is fetched by the player, we do nothing in VCL to prefetch. In the implementation here, after the player reads the manifest and fetches segments within the VCL service, it recognizes that it needs to prefetch and the prefetch logic in VCL subsequently kicks in.The approach here is similar to when the office printer’s paper tray is empty: an office manager grabs the paper bundle that’s on the shelf but also stocks up with two more bundles of paper and then loads the printer.

Since Fastly VCL only allows three restarts per request, we will limit the number of restarts to three and progressively reduce the number of restarts to zero with a cookie called ‘segment_count’ in the response. Try it yourself with this fiddle, or see a working example in action

With this approach, you’ll never see a MISS, MISS on our caches again, and all your requests will be pre-warmed at your shield cache.

pre warmed cache

The restart approach triggers preloading some of the video transport segments sequentially and then has to respond back with the segment the client originally wanted.

Diagram 3

You can improve efficiency even further if you’re able to account for cache hit in any of the segment fetches and avoid restarts, especially for segment 2 and segment 3 fetches.

But just because you can, is prefetching two HLS segments using VCL restarts really better than doing nothing at all? How about fetching just one segment and reducing restarts?

In a test setup that downloaded a sample set of 41 video transport segments with an average size of 602 KB and using Free VPN so as to route browser requests from Europe, and with a VCL service that did no prefetching, the average latency observed was 1.77 seconds. When the same segments were fetched on a service that was prefetching one segment (via restarts) the latency dropped to 1.75 seconds, which is an almost negligible difference but still seems to speak in favor of fetching only one segment. These aren’t absolute numbers, and there is likely to be a large degree of variance in these measurements, especially when you take into account downstream ISP performance, which can vary on a per request basis, the size of your segments, how your VCL is configured, your origin performance/configuration, the client/player of choice, etc.

The following lines were changed in VCL to reduce the prefetch count to one.

if (req.http.Cookie:segment_count && std.atoi(req.http.Cookie:segment_count) == 0) {
    if (!req.backend.is_shield) {
      set var.segment_count = 2;
    } else {
      set var.segment_count = 3;
    }
   ..
  } else { // Very first time seeing this request. No cookie
    if (!req.backend.is_shield) {
      set var.segment_count = 2;
    } else {
      set var.segment_count = 3;
    }
    set req.http.Cookie = "segment_count=" var.segment_count;
    set var.iteration = 0;
  }

While this improvement is insignificant, you can further tune the VCL algorithm. For example, you can account for cache hits and avoid restarts, especially when cookie segment count is one, as shown in the illustration above; or prefetch on edge only once every two or three segments.

So the pros and cons for this solution include, it’s all in VCL, no client side changes are needed, and client bandwidth isn't taxed unlike in the Link header approach. However, this implementation is a bit more complex and you can at a maximum prefetch two segments.

How to prefetch at the edge asynchronously using Compute@Edge 

This approach is similar to when you shop online to have your order ready to be picked up at the store instead of waiting until you’re at the store to shop for your items. 

When a player first requests a video manifest, Compute@Edge will read that manifest at the edge and then preload some of the segments from the origin asynchronously while delivering the manifest to the user. We also modify the manifest slightly to include the manifest URL in every segment request. This means that when the player is requesting segments, we can also trigger segment preloading by referring back to the manifest. This is because with Compute@Edge, you can read response bodies, make async calls to the backend, and loop through the playlist and prefetch as many segments as needed, unlike with VCL. Please find the source code in Rust here

You may build the sources into a WebAssembly package using Fastly CLI. Since Compute@Edge isn’t yet available to all customers (as of September 2021), it will need to be explicitly enabled on your Fastly account before you can use the Fastly CLI to publish the package as a Compute@Edge service on your account. Either way, you can still check out this working example, in which I prefetch five segments in advance, at every five-segment boundary.

In a test setup that downloaded a sample set of 41 video transport segments with an average size of 602 KB and using Free VPN to route browser requests from Europe and using a = Compute@Edge service with an origin server in the U.S. West, the observed average latency was 1.52 seconds. This is a more than 250 millisecond improvement over the service that doesn’t prefetch anything and also the most reduction. Once again, these aren’t absolute numbers and there is likely to be a large degree of variance in these measurements.

In the example, I have modified the playlist to include the playlist file name as a query parameter (instead of cookies) so you can prefetch when you receive future requests for individual segments.

individual requests

You’ll also notice that you always get a hit on all the segments because Compute@Edge prefetched those segments and cached them on the edge.

C@E prefetch

With Compute@Edge, you no longer have to guess the next segment number in the playlist like you did in all of the previous approaches, and you’re not limited to prefetching any system imposed number of only as many segments, like as the number restarts of backend calls you can make from edge if your service was written in (i.e. VCL restarts).

Here’s a diagram of what’s happening in this example. 

Diagram 4

With this solution, we have no client-side bandwidth impact, we can prefetch any number of segments, and there are no cookies or client-side code to consider. On the downside though, the approach doesn’t support shielding, so you have to cache at multiple edges, the implementation modifies the HLS manifest, and it doesn’t stream the requested object and then asynchronously wait for prefetch responses, which can be potentially much faster.

Pros and cons of each approach

prefetch pro and con

Conclusion

I’ve given you three ways to reduce latency by pre-warming the cache, none of which are expensive or overly complex. Plus, all these methods are self-serve and available without requiring additional support. Ready to try it out for yourself? Here’s the code I used in the examples above. And if you’re not yet a Fastly customer, check out what we offer for streaming media delivery.

Want to continue the conversation?Schedule time with an expert
Share this post