Video Cache Prefetch with Compute

Senior Sales Engineer, Fastly

Sales Engineer, Fastly

September 14, 2022

There’s nothing worse than waiting to watch a much-anticipated show and experiencing a lag in your video. To make the most of their customers’ viewing experience, publishers should consider video prefetch.

Why Pre-Warm the Cache?

For video content that has not been played for a while, or for new video content, time to first byte (TTFB) can be significantly higher when the player request has to go all the way back to origin for each piece of video. In cases where a publisher has some idea that they will have a large audience at a specific time, it is ideal to have all the video objects in the edge cache so that video is delivered very quickly to each geographical region.

Take, for example, the last episode of HBO’s “Game of Thrones”: this was an immensely popular show, and millions of fans were waiting for it to become available at 8 p.m. Sunday night. HBO knew they would have a huge, geographically-dispersed audience that would be hitting “play” exactly at 8 p.m. Since the video was new, this would result in a cache miss, and all of those viewers' requests would go back to origin. This would increase the time it took for the video to start playing, reducing performance and lowering customer satisfaction.

To solve for this, HBO could prefetch all the video objects, or at least the first few seconds of the video, so that the video was available to play immediately.

Pre-warming Pitfalls

Traditionally, cache pre-warming was done by either logging into each cache node and running a script to get all the video objects, or using a VPN to launch a client in each geographic area and request the video objects from the client side. However, there are some problems with this approach:

It’s slow and hard to manage. You need to know a lot of internal network information about the CDN you are working with. In addition, you need to know how to access each POP – or, in the client side case, you need to know how each geographic area will route to each POP.
You could be using valuable CDN cache real estate for content that never gets viewed. For instance, if you warmed all the POPs around the globe but the content only ended up being viewed in North America, all the other POPs would contain video objects that were not ever delivered to a client.

Pre-warming with Compute

Prior to Compute, we were limited in what we could do with a request. VCL did not have the ability to read or manipulate the request/response body. VCL was also limited in what it could do with asynchronous requests.

Compute removes these limitations and allows us the ability to do some very interesting things with video delivery, including efficient prefetching/cache warming.

Manifest Request Ahead

First, here’s a quick primer on streaming video delivery:

Video is delivered incrementally in small chunks. A player on the client side requests a file called a manifest. This manifest has URLs to other video objects that the player will start to request as it needs them for both buffering and playback.

Below is a truncated playlist manifest for a 720p version of “Big Buck Bunny” showing the first three relative URLs (these are the ones with the .ts extension).

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-ALLOW-CACHE:YES
#EXT-X-TARGETDURATION:13
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:12.500000,
buck_720p_0000.ts
#EXTINF:7.666667,
buck_720p_0001.ts
#EXTINF:11.466667,
buck_720p_0002.ts
#EXTINF:12.300000,
buck_720p_0003.ts

¹ For this post we are using Apple’s HLS for examples and terminology but other streaming protocols use similar mechanisms.

The player would request each of these media segments one by one. In a normal situation, without prefetching, if the cache is cold the request would go all the way back to origin for each segment. We can do better than that!

As previously mentioned, Compute has the ability to read the request body. This allows us to write a simple application that runs at the edge to pre-warm the cache. When the player requests the manifest, it can pass the request to the origin — but when the response comes back, the player will parse the manifest and get the first N URLs to the media objects. It can then pass the manifest back to the client. The following code snippet demonstrates this:

match m3u8_rs::parse_playlist_res(new_resp.take_body_bytes().as_slice()) {
   Ok(Playlist::MasterPlaylist(_pl)) => println!("Master playlist"),
   Ok(Playlist::MediaPlaylist(pl)) => {
       println!("Media Playlist. Path = {}", path_str);
       send_media_segments_requests_async(&pl, req_url)?;
   }
   Err(_e) => fastly::error::bail!("Invalid manifest"),
}
// I got what I needed so return the beresp in a Result
Ok(beresp)
}

The player will find the first N media segments and send the requests asynchronously, ignoring the response. This happens very quickly, so we send the manifest back to the player with almost no added latency. We don’t care about the responses to the media segment requests because we just want to pull them into the CDN cache. By the time the player starts to request the media segments, they should already be in cache.

CMCD-NOR

One of the standards the Consumer Technology Association (CTA) is working on is called Common Media Client Data (CMCD). CMCD is an open specification that defines additional information that can help with analysis, monitoring, and optimization in the delivery of streaming content, providing increased visibility into QoS and improvement in delivery performance. You can look forward to a more in depth discussion about CMCD in a follow up blog.

One of the elements of the CMCD is the Next Object Request (NOR). This is a string that can be passed as either a query parameter or in a header, which represents the relative URL of the next object to be requested. This Compute app would work similarly to the previous one but instead of looking for the next object in the manifest, we look in the query or headers:

match req.get_header("cmcd-request") {
   Some(cmcd) => {
       let cmcd = cmcd.to_str().unwrap().to_string();
       send_nor_request(cmcd, &req);
   }
   None => {
       // We looked for a cmcd-request header and it wasn't there so let's see if it's in the
       // query parameters.
       if let Ok(q) = req.get_query() {
           let qs_map: HashMap<String, String> = q;

           if qs_map.contains_key("CMCD") {
               let cmd = qs_map.get("CMCD").unwrap();
               send_nor_request(cmd.to_string(), &req);
           }
       };
   }
}

Here we are looking for the NOR request in the headers and, if it’s not there, we check the query parameters. If we find it in either location, we send it to the send_nor_request function which parses out the NOR url and then sends the request asynchronously.

Next Generation Intelligence

The previous workflows are the most obvious and easy to put into production. But Compute is so versatile and powerful that it allows us to do so much more. Consider some of these ideas that could be built on top of Compute:

Don’t prefetch credits: We know that viewership drops off at the end of a video and people tend not to watch the credits, so there’s no point in prefetching those. You could enhance the app so that it detects when the credits start by looking at a timestamp, tag or other metadata, and stop prefetching at that point.
Intelligent prefetching based on usage patterns: You could apply AI/ML to prefetching to make for a more optimal experience. For instance, you could use AI to figure out what times a video is being heavily viewed and only prefetch during those times.

Summary

By using Compute to pre-warm the cache, you are not only using a powerful, globally distributed network to do the work, but you also solve the pitfalls associated with legacy prefetch.