Cache Hit Ratio (CHR) as a security metric

Senior Principal Software Engineer, Fastly

August 02, 2023

Cache hit ratio (CHR) is a core metric of any system that includes a cache. Often, it’s the only metric. It’s measured by observing checks for content that may be in the cache. We calculate cache hit ratio by dividing the number of cache hits by the total number of cache checks. If each check can result in either a hit (content found) or a miss (content not found), then the ratio is:

Historically the CHR has been understood as a performance metric. It’s important to pay attention to because the more you cache, the more money you save and the faster your site performs. When you increase your percentage of hits-to-misses you get these benefits:

Reduced hardware capacity needed at origin
Lowered architectural complexity needed at origin
Significant reductions / savings in egress charges
Latency reductions from serving the content from cache

But it’s not just about performance! There are also serious security benefits that come with improving your CHR which have often received less attention. For example, one respondent to a recent study* reported a 40% decrease in security incidents just from switching their delivery onto Fastly’s CDN. (Read the full study.) So we think it’s time to start thinking about CHR as a security metric as well as a performance metric.

Cache hit ratio as a security metric

Before we talk about the ways in which CHR improvements indicate a reduction of risk, let’s talk about what can elevate your security risk:

Larger surface areas for attack

The more servers you need to use, the more potential points of exploitation you have to manage. Even if you have the luxury of scaling horizontally to meet growing capacity demands, and each of those servers is supposed to be secured according to your best practices, the odds of a misconfiguration or vulnerability add up to more opportunity for exploitation.

More complex applications

If you don’t have the luxury of scaling horizontally and you need to scale by making your system more complex, now you have two problems. A more complex system is harder to manage and secure because there are more pieces to exploit.

Application complexity grows as you need to handle more load at the origin. For example, as your traffic increases you may need to load balance against several servers or introduce replicas of your primary database. When you can handle this load in your CDN at the edge, your overall capacity can scale more independently from your capacity at origin, letting you keep your applications at origin simpler and more secure.

More complex deployment models

More complex deployment models are another way that risk can creep into your architecture. This can take many different forms, having more regions or availability zones, or by supporting multi or hybrid cloud architectures. It can also be in the form of using more of the services from big cloud providers like Identity and Access Management (IAM) or database or network delivery offerings. As you add regional or zone complexity, or use more of these tools you shift more of your security over to those tools and providers. It increases your risk for configuration problems because more boundaries exist between those tools and your origin systems, as well as in the relationships those tools have with each other.

Even though each of these pieces gets significant security work on its own, the interfacing that happens between them in your logic always brings an element of risk. It’s not all doom and gloom, but if you want to protect against the risk in your system, it’s important to understand where it exists, and how to mitigate it.

Increased opportunity for human error

It’s not just about the vulnerability within the systems themselves, it’s also about the potential for human error. More complexity of any kind compounds the risk of mistakes, whether it’s architectural complexity, amount of hardware to manage and maintain, or extra processes to manage. When a system is harder to reason about it’s harder to diagnose, fix, or reduce risk for human error. Any approach to solving a problem that requires you to think harder about the problem represents a risk. You don’t fix a leaky faucet by rerouting your pipes.

Supporting increased levels of complexity that invite increased probability of human error is a risk. It’s a risk when everything is going ok and all the lights are green, and it’s a massive liability when something actually goes wrong, everything turns red, and you can’t figure out where the problem lies.

Downtime (is a security issue)

Downtime is a security issue in several ways. First, a DDoS attack is considered a security attack, and a failure to protect against DDoS is a security failure. Second, and any time normal practices are disrupted you’re more at risk for security being compromised as you use alternate methods and processes that are less hardened, or where people are scrambling and there is more susceptibility to attacks that would be caught under normal circumstances. Third, every minute spent by a SecOps team dealing with an active DDoS attack that should be handled more automatically represents time that has been taken away from proactive work on organizational priorities

https://www.fastly.com/products/ddos-mitigation

How does CHR help with these security issues?

Let’s look at the benefits that come with achieving a higher CHR:

Reduced hardware capacity reduces the attack surface
Lowered application and architectural complexity
Reduced infrastructure management and maintenance at origin
Offloading traffic from your origin and onto a highly resilient, automated, and hardened secure edge platform

So risk comes from increased hardware management, increased application complexity, increased architectural and deployment complexity, systems that are harder to reason about, and downtime. And a higher CHR is an indicator that you are successfully reducing the amount of hardware you need, decreasing application complexity, decreasing architectural and deployment complexity with simpler systems that are easier to reason about, and guaranteeing more uptime and reliability via a hardened and automatically responsive CDN partner.

CHR isn’t a direct measure of security, but improvements you make in your CHR are evidence of succeeding at the things that help you create a more secure system. Different types of applications have different upper limits for their CHR – some tougher situations might be hard pressed to exceed 80% or 90%, but many Fastly customers are able to achieve 95% or better. This is as big a win for security as it is for performance or cost savings.

Now that we understand the basic relationship between your CHR and your security posture, let’s dive deeper into understanding your cache hit ratio, and how to improve it.

Understanding the Cache Hit Ratio

Hits and misses are typically positive counts, which restricts CHR to the range from 0 to 1.0. This is often expressed as a percentage, so you may see your cache hit ratio between 0% and 100%. A CHR of 0% means that everything is a cache miss; this means that every piece of content is either fetched or calculated on demand. A CHR of 100% means that everything is served directly from cache.

The operations performed on a miss tend to be more expensive (in bandwidth, CPU, or both), so a higher CHR usually indicates a system with better performance. Now, each application will have its own cache hit ratio at steady state. This depends on content expiration times and access patterns. For example, infrequently accessed content may be evicted sooner than its expiration to make room for something else. But in general, keeping your CHR as high as possible means that there’s less of the undesirable slow work going on.

For example: Cache Hit Ratio at steady state

To motivate this view, let’s look at what happens when cache hit ratio changes. In this first set of examples, we’ll assume the rate of cache operations (hits & misses together) is staying roughly the same; your application is at steady state, having a perfectly normal day.

If circumstances cause your CHR to drop at this steady state, that’s because some hits have turned to misses. Let’s consider an application that sees 50 hits and 50 misses every second.

That’s a 50% cache hit ratio. Let’s look at the impact of a reduced ratio: if half of the hits become misses instead (25 hits and 75 misses), then we’d see a CHR of 0.25:

Let’s think about the impact of a CHR drop like this.

First, the misses have increased by 50%. This means that your origin will see 50% more requests for content, because that content hasn’t been found in cache. This means a 50% increase in hardware necessary at your origin.

Ok, what happens if things move in the other direction?

If half of the original misses become hits instead, then you end up with a cache hit ratio of 75%. The origin sees half as many requests, which means a 50% reduction in hardware necessary at the origin. This can also mean a less complicated software architecture or an ability to fit into fewer availability zones at the origin, both of which reduce the attack surface for your application.

Cache Hit Ratio as Origin Protection

One way to think about cache hit ratio is a measure of how well the servers at your origin are protected. At the same request rate, a higher cache hit ratio indicates that less traffic is making it to your infrastructure. Usually when we talk about origin protection we're thinking about protection from traffic spikes, not from attackers. But the same benefits apply when looking at security too.

When requests are served from Fastly rather than from your own infrastructure, you restrict the attack surface to our platform, which is more resilient and continually updated to be as secure as possible.

Lower cache hit ratio

With a lower cache hit ratio more of your traffic is served directly from origin, meaning you get proportionally worse protection during an attack or unusual spike in traffic. That’s an availability concern, and downtime is a security risk. But a lower cache hit ratio has security impact at steady state, too. Serving a higher request rate from origin means you need to run more infrastructure, and maybe a more complex architecture. This brings us back to the additional risk due to increases in complexity and opportunity for human error that we discussed above.

Higher cache hit ratio for a smaller attack surface

The attack surface gets smaller as CHR gets bigger because hits are served entirely from Fastly infrastructure, and misses are fetched from your origin. A higher cache hit ratio indicates a higher percentage of hits. It indicates that fewer requests escape to your servers at all.

How to raise your cache hit ratio

Here are some of the best ways to increase your CHR. Some of these are specific to Fastly, and that’s because we built our platform from the start to be able to cache more than legacy CDNs. Today’s organizations need to improve their end user experience by serving more content, faster and more efficiently.

Increase your TTLs so content is eligible for eviction later

The main parameter you have that controls the cacheable lifetime of an object is the TTL, or Time to Live. This dictates how long Fastly can reuse that content to serve future requests. A shorter TTL will lead to your content being evicted sooner, resulting in cache misses and more fetches to your origin.

If you think your content is too dynamic or changes too often to use a long TTL, I can almost guarantee you are wrong (and you should schedule some time to talk with us). One of the tools we provide to support dynamic content is Instant Purge…

Use long TTLs and evict content when it changes with Instant Purge

Fastly’s Instant Purge API allows you to evict content from our network, on demand and extremely quickly: this takes an average of 150ms to execute a purge worldwide. With a little integration work in your application, you can take advantage of very long TTLs (how about a year?) even with rapidly changing content, like API responses or live sports scores. You may find that you can cache a lot more content than you think.

Enable Origin Shielding

Fastly’s Origin Shielding feature allows you to designate a specific POP to act as your origin for our other POPs; this can significantly increase the likelihood of finding your content in cache, with a corresponding reduction of load to your origin. Read more about origin shielding →

Temporarily serve stale content

Configuring your Fastly service to serve stale content keeps your service available even if your origin is unavailable.

Choose a CDN with fewer, more powerful POPs with more storage

This sounds a little counterintuitive, but the fewer POPs a CDN has, the more likely it is for any single POP to already have your content in cache. The tradeoff, of course, is that each of those POPs has to have much more storage capacity in order to serve a larger combined content pool. This is one of the core principles of Fastly’s CDN architecture. Read more about the benefits of modern POPs →

Consider other Fastly products with tight cache integration

Enabling our Image Optimizer allows you to move image logic off your origin, performing image transforms and optimization at the edge. But it also has very tight integration with Fastly’s cache, ensuring that even for a wide variety of device-specific images, we make maximum use of available storage.

It’s integrated with Instant Purge, as well: purging your high-resolution originals also purges the images that have been derived from them. You can still take advantage of long TTLs, even with changing content.

For more details and ideas about improving your cache coverage, check out caching configuration best practices.

What else can you do to move requests from your origin?

A dedicated attacker will find ways to reach the origin. Moving logic from your origin to the edge reduces the attack surface even further. Once you have improved your CHR as much as possible, you can still find ways to cut requests to your origin by producing synthetic content with Compute, our edge compute platform.

Each Compute request runs in its own sandboxed environment. We create and destroy a sandbox for every request that comes through the platform. This limits the blast radius of buggy code or configuration mistakes from other users and can reduce the attack surface area. It’s an extremely robust safety feature, and lets you move more of your logic to an environment that is secure-by-design in ways it wouldn’t be at origin. You’re getting more secure in two ways – additional origin protection with more logic at the edge combined with executing that logic in a more secure compute environment. Read more about modern application development on the edge: Download the playbook now→

Moving your logic to Fastly’s edge reduces the attack surface at your origin on its own, but once you’ve done that you may find that performance improvements are also close at hand. We’ve recently released our Core Cache API, which enables you to move partial or synthetic content generated with Compute into the cache. That way you get the security benefits along with the low latency of cached content.

*The Total Economic Impact™ Of Fastly Network Services, a commissioned study conducted by Forrester Consulting on behalf of Fastly, July 2023.