What is a CDN and why you should use one
Most websites and applications that people interact with every day are run out of one physical location, but the content on sites or applications (like images, text, and video) still needs to travel over wires to the entire world.
It works like this: if a website’s servers are based in New York City, people in Boston will get the content faster than people in San Francisco or Tokyo. The farther away customers are from a company’s data center, the slower the website or application loads — creating an inconsistent and frustrating user experience.
Lag times of any length frustrate web and mobile users accustomed to real-time digital experiences. According to LoadStorm:
25% of users will abandon a website that takes longer than four seconds to load.
74% of users will abandon a mobile site that takes longer than five seconds to load.
46% of users won’t return to a poorly performing website.
This problem can be fixed with a content delivery network (CDN).
What is a CDN?
A CDN is a way to deliver content from your website or mobile application to people more quickly and efficiently, based on their geographic location. A CDN is made up of a network of servers (“points of presence,” or POPs) in locations all over the world.
The CDN server closest to a user is known as the “edge server” — when people request content from a website served through a CDN, they’re connected to the closest edge server, ensuring the best online experience possible.
Imagine that you’re in San Francisco and you’ve requested an image on a server in London, 5300 miles away. It would typically take around 300 milliseconds to send the request and receive the response.
If you were to request the same image from a server in San Jose, which is about 50 miles from San Francisco, it would take about 10 milliseconds to send the request and get the response. That’s 30 times better than the first case but because we’re speaking in terms of milliseconds the difference might be imperceptible.
However, that barely perceptible difference is huge when one considers that a typical webpage can include over 2 megabytes of information spread across 30 requests. Because browsers only make a small number of concurrent requests and each request may involve several round-trips to the server, these milliseconds add up to many seconds, making the website slow.
How do CDNs work?
To avoid the dissatisfied users created by slow service, CDNs move content closer to the user in order to reduce latency and improve the user experience. In theory this is neat, elegant, and self-explanatory. In practice however, there are some pretty gnarly technical challenges.
First, in order to reduce the latency for any particular user, a CDN must have a content caching server – a cache – that’s close to them. Unfortunately it’s not feasible to have a nearby cache for every possible internet user. Instead, we organize the caches into PoPs, distribute them throughout large geographic regions, (Europe, US, Asia, etc.), and then place them in major population centers within those regions.
Next, given a request by a single user a CDN must direct it to the closest POP. Most CDNs do this by leveraging a technology called GeoIP. GeoIP can be thought of as a large lookup table that maps IP addresses to geographic regions, (country, city, etc.). When a request is being processed, a CDN will reference the table and direct the user’s traffic to the closest available server.
Caching content to a CDN
You can cache (temporarily store) your content on a CDN so it’s delivered from the edge to your end-users much faster than if it had to be delivered all the way from the origin. If you use a CDN, it means that if someone tries to access content from your website or mobile app, then that person’s request for content only needs to travel to a nearby POP and back, not all the way to the company’s origin servers and back.
One can think of a cache as a large key-value store. When a request comes in, it’s the cache’s job to determine what the user is requesting, locate the data, and send it back to the user.
There are many pieces of request information that can be used to determine what content to serve. This can include such things as the domain name, path, query parameters, and even headers. Caches employ multi-level lookup tables that use optimized algorithms to find the correct content in the shortest amount of time.
CDNs also purge (remove and update) content constantly, so that the most current, relevant content is delivered. Also known as content invalidation, purging allows businesses to update content when necessary.
Who can benefit from using a CDN?
Anybody who has a website or mobile application that’s likely to be requested by more than one user at a time can benefit from a CDN. They are especially useful to large, complex websites with users spread across the globe, and websites or mobile apps with lots of dynamic content. Some of the benefits CDNs can provide your website include:
Faster load times for web and mobile users
Quickly scalable during times of heavy traffic
Minimizes risk of traffic spikes at point of origin, ensuring site stability
Decreases infrastructure costs due to traffic offloading (less load on origin)
Better site performance
CDNs also offer many specific benefits to different types of businesses and organizations, such as:
E-commerce. A CDN helps e-commerce sites deliver content quickly and efficiently even during times of heavy traffic, like Black Friday and the holidays.
Government. Large, content-heavy websites can deliver vital information to citizens much more quickly and efficiently by using a CDN.
Finance. CDNs provide banking institutions with a fast, secure, and reliable infrastructure to deliver sensitive data to consumers and analysts.
Media / Publishing. Media websites need to deliver timely and up-to-date information, and a CDN can help media companies update headlines and news homepages as stories unfold in real-time, and remove data as it becomes outdated.
Mobile apps. A CDN delivers dynamic location-based content for mobile apps, reducing load times, and increasing responsiveness.
Technology and SaaS. A CDN helps technology websites serve billions of requests a day to web users without decreasing performance.
Modern CDNs vs. traditional CDNs
CDNs have been around since the late 1990s, but traditional CDNs often lag behind advancements in hardware and technology, and can’t provide the same benefits as a modern CDN. Often, these legacy CDNs are not built-in agile software environments, where the company is constantly iterating on products, incorporating customer feedback, and improving the product. These CDNs have been around for five or more years without much change and have critical inefficiencies that modern CDNs have improved upon:
Caching only static content
Dynamic content, on the other hand, includes frequently changing content that requires server logic — credit card transactions or updates to an individual shopping cart on an e-commerce site, for example. Dynamic content is often categorized as “uncacheable” because it has to be passed through an origin server due to the sensitive nature of the data.
This is true, to some extent. There’s a large portion of dynamic content that can be cached — content that doesn’t include personal data but is still unpredictable and frequently changing. This dynamic content is event-driven — based on an action from either a human or machine. Think stock prices, user-generated comments on an article, news headlines that need to be updated instantly, or sports scores.
Most CDNs treat this content as “uncacheable,” as they would with other dynamic content, but it can actually be cached. Learn more about how modern CDNs cache dynamic content.
Limited storage space on the edge
Traditional CDNs can offer their clients only so much real estate at the edge, due to the fact that they mostly rely on spinning hard drives. That means they have to prioritize which content is cached at the edge, and which is cached further in. This often means that larger websites are given priority over smaller websites.
Alternatively, modern CDNs are built on a large network of solid-state drives (SSDs) and can cache all content at the edge, so all customers get the benefit.
Another major benefit of modern CDNs is reverse proxying. With traditional CDNs, customers are expected to upload their content directly to the cache servers the first time. Modern CDNs fetch and store content from the customer’s origin server as it’s requested, so there’s no need to front-load the cache servers.
Websites using traditional CDNs are often forced to keep dynamic content on the origin server, which can lead to traffic spikes and slow performance, defeating the purpose of having a CDN in the first place.
How is Fastly different?
Fastly redefines the legacy CDN model through advanced features such as reverse proxying and instant purging. Traditionally, when using a CDN, it is the customer’s job to upload content directly to the cache servers. Instead of requiring one initial cache fill, Fastly fetches – and then stores – the content from the customer’s origin server as it’s requested. This method, called “reverse proxying”, eliminates the need to front-load the caches.
When content changes, instead of uploading a new copy of the resource, Fastly’s customers send us a short message instructing our cache servers to invalidate that content. Later, when the invalid content is requested, we fetch and replace the content via the origin. This process, called “instant purging”, allows customers to perform updates in approximately 200ms. With legacy CDNs the upload process can take anywhere from 15 minutes to an hour.
Instant purging also sets Fastly apart from its competitors in a significant way: we make it possible to serve dynamic content. Because any HTTP request can be cached, we simply fetch the dynamic page from the origin and our customers issue a purge request when the underlying data-model changes. In some cases it can be as simple as adding a hook in the model-level of an application.
Security considerations with legacy CDNs
CDNs have been around for a long time, but they’re not all built in the same way. While Fastly’s edge cloud platform goes beyond traditional content delivery networks by moving things to the edge, there are more fundamental differences; it’s not uncommon for CDNs to make up their own rules about how they serve web traffic, since CDNs didn’t exist when HTTP was defined. To improve this, we’re working alongside other platforms to standardize basic protocol handling for CDNs.
A while back, security researchers noticed what had concerned engineers working on content delivery networks CDNs for some time; it’s possible to point competing CDNs at each other to bring them down. From the abstract of Forwarding-Loop Attacks in Content Delivery Networks:
Malicious customers can attack the availability of Content Delivery Networks (CDNs) by creating forwarding loops inside one CDN or across multiple CDNs. Such forwarding loops cause one request to be processed repeatedly or even indefinitely, resulting in undesired resource consumption and potential Denial-of-Service attacks. To evaluate the practicality of such forwarding-loop attacks, we examined 16 popular CDN providers and found all of them are vulnerable to some form of such attacks.
Because of the massive scale of many CDNs — with terabits per second of link capacity available to each, worldwide — this is a potentially very scary problem. Whether it’s an intentional attack or an accidental misconfiguration, it could take large parts of the internet down with the CDNs in question. In fact, one CDN engineer I was discussing this with recently said it was one of the problems that “kept him awake at night.”
Many CDNs, and Fastly’s edge cloud platform, already have loop protection mechanisms in place, usually by using a header to identify requests that have already been seen. The problem is that these solutions aren’t coordinated between each other, so that one CDN might be configured to delete another CDN’s loop detection headers — either intentionally or accidentally.
A properly configured CDN may also help protect websites against some common malicious attacks, such as Distributed Denial of Service (DDOS) attacks.
Emerging advanced CDN technology
We wanted something better, so we started talking to our colleagues at other CDNs as well as content platforms. The result was a small specification in the HTTP Working Group for the CDN-Loop request header, which was approved for publication as a standards-track RFC by the Internet Engineering Steering Group (IESG) two weeks ago.
This is a very simple mechanism. For example, Fastly’s looks like this:
GET /index.html HTTP/1.1
Implementing CDNs are required to add it on each request they make, and to protect it from accidental (or not-so-accidental) modifications so that they can detect and mitigate such loops more reliably — even when a loop involves multiple CDNs.
We’re very happy about this small step forward because it’s a sign of change; since their early beginnings, there hasn’t been much coordination between CDNs about issues like this, much less in providing a consistent experience to our customers. So, while small, it’s an important step, as it signifies an industry that’s willing to work together.
CDN community vision
Fastly is contributing to a number of other community efforts to build better, more consistent experiences across vendors, including:
The HTTP Working Group has started work on the Cache response header to create a more consistent debugging experience for cache behavior, both in CDNs and other HTTP caches.
We’re also contributing to a new, so-far unadopted draft on the Proxy-Status response header, to make CDN and reverse proxy debugging easier when something goes wrong.
The core HTTP specifications are being revised (again). This time, we’re paying special attention to how CDNs behave to make sure that they don’t need to violate the specifications on a regular basis.
That core work is being informed by an emerging set of common test cases for both browser and intermediary HTTP caches.
We’re looking at more advanced features like Variants to give intermediaries (including CDNs) more information about the content they’re serving, so they can make more intelligent caching decisions.
Once we get these and a few other basic building blocks for CDN behavior standardized, we hope to move on to more exciting things — like standardizing how an origin can discover that it’s behind a CDN, send purge requests to it, and even use surrogate keys without lots of configuration.
We believe that having one standard way to interact with a CDN makes our customers’ lives easier. It also makes it easier for content platforms like WordPress and Drupal to seamlessly work with CDNs when they’re in front of such a site — making things better for everybody.
But that’s just the beginning!
This post has touched upon many of the core ideas behind what CDNs are, how they operate, their benefits and risk, and how Fastly is different. Check out different CDN options and see for yourself how they can improve your website performance.