Back to blog

Follow and Subscribe

Deploy for Performance: Fastly’s Principles of Infrastructure Diversity and Soft Control

Brian Haberman

Distinguished Engineer

This article is part four of Fastly’s "Pillars of Resilience" series, exploring how we design, build, and operate our global network for maximum availability and performance. Read the full series:

As discussed in the introductory post of this series, resilience is a core philosophy at Fastly. It's built into our culture, architecture, and processes to ensure our network delivers content with unwavering availability and speed. We also talked about the foundational role observability has in everything we do. It provides the bedrock for understanding our network's behavior. In this post, we’ll explore two more key tenets of Fastly’s Pillars of Resilience. One of them, infrastructure diversity, affects observability, but doesn’t build on it. The other, soft influence can’t happen without deep observability.

Infrastructure Diversity

In biological and ecological systems, genetic homogeneity poses several risks due to a lack of diversity. A disease that adversely affects one individual has the potential to affect all members of the population since they share the same genetic weaknesses. Similarly, engineering systems all built with the same components are all susceptible to the same threats. At Fastly, infrastructure diversity is table stakes for building a highly resilient, highly available content delivery network (CDN). It's about architecting a system that can be built without bespoke components. Fastly’s Points of Presence (PoPs) are not built on a single type of hardware. Or deployed with the exact same software. Or connected to the Internet with the same network configuration. Imagine a system where every server in the fleet is identical. It is easier to build, operate, and debug. But if a bug or vulnerability is discovered in that configuration, it could potentially affect the entire fleet, creating a massive single point of failure.

Fastly mitigates this risk by incorporating diversity at multiple levels:

  • Hardware: Utilizing different vendors for servers and multiple models of switches. This prevents a single hardware flaw from causing a widespread outage.

  • Software: Leveraging rolling updates (i.e., progressively deploying to an increasing number of servers) during software deploys. If an issue is detected in new code during the early phases of a rolling update, it can be quickly reverted while the older code continues to serve content.

  • Network connectivity: Partnering with multiple upstream network providers. This ensures that if one provider experiences a major outage, traffic can be rerouted through others, maintaining connectivity.

  • PoP design: Building PoPs without a cookie-cutter concept. Fastly PoPs differ in the number of caches, topology, and geographic footprint. They are also diversified over co-location providers.

The following illustration provides a high-level view of Fastly’s approach to PoP diversity. By geographically distributing our PoPs, we minimize the impact of any one network, data center, or regional disruption on the overall availability of our services. Traffic destined to an adversely affected PoP can be rerouted to another PoP. In areas with higher concentrations of customers or users, a Fastly PoP may be constructed from multiple, diverse sites (illustrated in the expansion of the PoP in the upper-right corner), allowing for a more balanced distribution of workload across the area.

Fastly's PoP Diversity

By embracing infrastructure diversity, Fastly creates a global system that is inherently more resistant to large-scale disruptions. It's like having multiple escape routes from a building; if one is blocked, there are other paths to safety.

Soft Influence

As a CDN, Fastly generally does not control both ends of a network connection. We cannot exert explicit control over the operation of the device (server or browser) at the other end of a connection. We operate in a system of independent systems. And because of that independence between systems, no one has complete information about all the factors that affect the flow of information. Our robust observability gives us fine-grained information that can be used to help our systems make better decisions. The goal is to use our observations to steer network traffic onto the paths we see as more performant.

Soft influence is a key technique that leverages passive network performance measurements to guide network traffic. Instead of using a rigid, hard-coded approach, it uses real-time observability data to subtly influence how a client app selects a target cache or how a customer’s origin server reaches Fastly. This creates a more nuanced and dynamic way of performing traffic engineering. The concepts described below should sound familiar to those who read our earlier resiliency post describing AutoPilot and Precision Path.

The core idea is to influence IP address selection. When a user's device, such as a laptop or phone, initiates a connection, it needs to resolve a domain name to an IP address. Fastly's DNS infrastructure, backed by its deep observability platform, can analyze passive network performance metrics like latency, packet loss, and jitter. This data, collected continuously from every part of the network, allows Fastly to determine which of its PoPs is currently providing the best experience for a given user. 

Based on this real-time performance data, Fastly's DNS service will then influence the IP address that is returned to the client. While multiple IP addresses may be available, the one for the optimal PoP is prioritized. This ensures the user is directed to the cache that can deliver content with the lowest latency and highest reliability.

Similarly, when Fastly needs to reach back to a customer’s origin server, our selection of source IP address will influence the path used by the origin server to send back data. Our choice of source IP address is driven by the same real-time network performance data discussed above, as well as our analysis of BGP routing state within the Internet. The illustration below highlights an approach to steer traffic based on address selection. At the instantiation of a connection where Fastly is pulling content from the origin, Fastly’s selection of the target address for the origin (IP1) pins that address as its source address based on an analysis of performance for all paths (i.e., IP2 is less performant at the outset). If the route being used for that connection were to go away and the connection were lost, we would need to initiate a new connection. Fastly’s system looks at the available routes to all of the origin’s IP addresses, selects the most performant route, and initiates a connection using a source address that will steer the origin’s traffic onto that path.

Steering traffic by influencing source address selection

This approach offers several advantages:

  • Dynamic optimization: Soft influence responds in real-time to changing network conditions. If a PoP becomes congested or a route experiences degraded performance, the system can automatically and gracefully steer traffic away from it.

  • Improved user experience: By directing users to the best-performing cache, soft influence minimizes latency and improves overall page load times and streaming quality.

  • Graceful degradation: In the event of a disruption at one PoP, traffic doesn't just stop. It's smoothly rerouted to the next best option, preventing a hard outage and ensuring service continuity.

Soft influence is a powerful tool that transforms traffic management from a rigid set of rules into a fluid, adaptive system. It's a testament to the idea that true resilience isn't just about preventing failures, but about having the intelligence to navigate and recover from them gracefully.

Fastly’s Foundational Dedication: Performance, Reliability, and Resilience

Fastly's use of infrastructure diversity and soft influence isn't just about technical sophistication. It's about a foundational dedication to performance and reliability. By embracing these principles, we build a service that is both robust to disruptions and intelligent enough to adapt and optimize in real-time. This dynamic approach ensures that your content is delivered with unmatched speed and availability, even in the face of an ever-changing internet. For Fastly, resilience is the foundation of everything we do.