Origin Offload: A measure of CDN efficiency for reducing egress cost

The world of Content Delivery Networks (CDNs) has long been obsessed with cache hit ratio (CHR), but there are two big problems that need to be cleared up. The first point is that many people misunderstand how big of an impact they can have at their origin by implementing what seems like “small” CDN caching improvements. The second point is that CHR isn’t actually the best way to measure total offload to a CDN, which is why Fastly is excited to introduce a better measure called Origin Offload that focuses on origin server efficiency rather than just the number of requests being made. 

Origin Offload measures the ratio of bytes served to end users that were cached inside the CDN (not fetched from the origin), over total bytes served to end users for the service. An Origin Offload of 100% means all bytes were served from the CDN.

A common misunderstanding of origin offload

If you like classic mathematical puzzles, one way to understand this is through the potato paradox. But the way this applies to CDNs, CHR, and Origin Offload is that many people get distracted by relatively high CHR percentages and think they’re doing a good job without realizing that they’re missing massive opportunities for cost savings through egress reduction, and other savings through overall traffic reduction.

The cache hit ratio is the percentage of requests a CDN can serve from its cache rather than fetching them from the origin server. Even for those building CDNs, the CHR can be confusing. If your CHR increases from 90% to 95%, it’s not merely a 5% improvement; it actually halves your origin load.

To understand why a small increase in CHR can significantly reduce origin load, think about the miss rate. When the CHR improves from 90% to 95%, the miss rate drops from 10% to 5%. This means the miss rate is now halved. Since the requests that miss the cache must be fetched from the origin, halving the miss rate effectively halves the load on the origin.

Understanding the limitations of Cache Hit Ratio

  • Request-Based Calculation: CHR measures requests, not data size. If your objects vary greatly in size, CHR won’t accurately represent the origin load. For instance, a single large file and many small files might have the same CHR but very different impacts on the origin.

  • CDN Internal States: CDNs have complex internal mechanisms that can skew CHR. E.g. shielding, restarts inside VCL logic, image optimization, and segmented caching. Take shielding as an example, a feature that reduces origin traffic by caching content at an intermediate layer can make CHR appear misleading. If a request is served from the shield cache but missed at the edge, the classical CHR calculation shows a 50% CHR (shield hit / (edge miss + shield hit)). In reality, the request was fully served by the CDN and was not fetched from the origin.

Here’s a graph that shows a customer disabling shielding (as part of an early experiment during onboarding). The graphs show that shielding was in fact significantly reducing their origin traffic (from just above 1.6GiB/s with shielding to 20+GiB/s  steady state without shielding), yet the classical CHR did not reflect this improvement accurately:

Edge Traffic:

As you can see in the example above, with shielding, CHR peaks around the low 90%, while origin load is capped well below 5GiB/s. Without shielding, after an initial impact to CHR and origin load, CHR recovers to low 90s% while the origin load stabilizes in the low 20GiBs range, IE 4x the origin load with shielding.

This case study shows that CHR doesn’t adequately capture the state of origin load when shielding is at play.

Introducing Origin Offload 

For many customers, egress traffic (the amount of data transferred out of their origin infrastructure) is a primary cost driver. Egress is particularly a huge cost element for video and audio streaming customers as well as large download providers. To provide our customers with a better view of this, we introduced a new metric: Origin Offload.

Origin Offload measures the ratio of bytes served to end users that were cached inside the CDN (not fetched from the origin), over total bytes served to end users for the service. An Origin Offload of 100% means all bytes were served from the CDN.

Here’s a graph showing the same customer, now with the Origin Offload metric, clearly showing the impact of disabling shielding.

The new origin_offload metric is available from the Fastly Historical API as well as the Real Time API. The endpoints you are familiar with (see links for sample code) will now include the origin_offload value along with existing metrics in their response. 

You can also find the graph in Fastly UI:

When thinking about the capacity of your origin infrastructure, requests and egress bytes are the two key components of capacity planning. The cache hit ratio is vital to the performance of your origin as it directly influences the efficiency of your content delivery and your system’s load. A higher cache hit ratio indicates that more requests are being served from our edge caches rather than your origin server, significantly reducing latency and improving your origin’s response times. On the other hand, Origin Offload is critical to your origin server's efficiency as it reduces the volume of traffic that your origin server must process and the network capacity required to deliver that traffic to our CDN

By leveraging Fastly edge caches to serve static and frequently accessed content, you can lower your origin egress cost and free up request processing capacity on your origin for handling dynamic content.

This model of load distribution prevents your origin server from degrading your users’ experience, increasing overall system resilience and reliability, and as an added bonus, a high origin offload reduces the infrastructure and operational costs associated with scaling your origin to handle peak loads. A high origin offload is essential for maintaining efficient resource utilization and ensuring an uninterrupted experience for your end users. Furthermore, a high cache hit ratio reduces operational stress on your origin infrastructure, minimizing the opportunities for bottlenecks and server outages.

How to improve your origin offload and CHR

Cache hit ratio is crucial, but it should be supplemented with metrics like Origin Offload to fully understand CDN performance and its impact on origin load for informed decisions and cost reduction. Fastly’s edge cloud platform offers more than just a standard CDN. It offers a modern network with powerful, strategically placed, and software-defined POPs. If you want to improve your performance, speed and cut costs, let’s talk!

Published

5 min read

Want to continue the conversation?
Schedule time with an expert
Share this post

Monique is a Senior Engineering Manager for the Customer Usage Pipeline Systems (CUPS). Prior to Fastly, Monique was an engineering leader at Digital Entertainment, Fintech and medTech companies. Outside of work, she enjoys being outside in nature and traveling.

Peter is a Senior Principal Software Engineer at Fastly, specializing in the customer metrics pipeline, and with expertise in image processing and edge applications. Peter enjoys playing with perceptions of time, creating sounds, and engaging with the weird internet. He enjoys everyday nature: rooftop ravens, beach succulents, and notable hills.

Brad Benvenuti is a Director of Engineering at Fastly, where he leads the development of observability products. Brad has spent over 15 years enjoying the challenges of software engineering. Prior to Fastly, he was an engineering leader at StockX and a software engineer at Netflix. Brad can be found in Michigan, where he wishes it was always summer and he enjoys spending time outside, biking, and playing with his kids.

Hossein Lotfi is the VP of Engineering at Fastly, where he leads the Network, Platforms, and Edge Systems organization. He is responsible for building reliable, highly scalable, cost-effective, and low-latency systems that power Fastly’s network services and serve as the foundation for its broader product offerings


His team spans a wide range of engineering disciplines across the entire tech stack, from hardware and network architecture to kernel-level optimization and low-latency data paths, all the way up to edge systems such as caching and workload management. Hossein believes in the compounding value of having experts across every layer of the stack working collaboratively to tackle complex scalability and efficiency challenges with innovative solutions.


In addition, he oversees Fastly's network operations and SRE functions. He is a strong advocate for the idea that system design is most effective when informed by direct involvement with the state of production systems and the operational challenges of a large-scale, globally distributed infrastructure.

Ready to get started?

Get in touch or create an account.