Three questions that make edge state easier to design

Principal Product Manager, Compute, Fastly

April 13, 2022

Want to build an app in the cloud? There’s a guide for that. In fact, these guides are so ubiquitous that many Fastly customers express the desire to borrow code from others that can be plugged into their business logic. It’s so easy to get started in core clouds that most development teams are now cloud native, and the low barrier to entry for web apps allows for quick innovation.

However, as applications scale globally and teams work to increase performance and security, the ease of getting started begins to incur huge costs in scalability and management. What was simple to deploy to a region now requires special projects to deploy globally. For example, health checks and performance metrics deployed in a starting region now need to be replicated or rewritten for new geos, requiring more dev time. And assets, configurations, and application-critical data in central locations now incur global round trip latencies for simple operations. The global picture is not just a combination of many regions — it’s often spaghetti code with unreliable performance.

Rather than force old monoliths into edge frameworks, many developers find that designing a system as edge native is much easier in the long run. Edge-native application design assumes global scope and obviates many of the deployment, management, and data replication concerns that application developers experience on core clouds today. But where does your data live? How do you manage tradeoffs between products? Edge state management doesn’t let you get started as quickly as core cloud templates, but with a little bit of thought, state management at the edge can be straightforward and save a lot of scaling headaches.

In this post, we’ll cover three questions — and recommendations for each — to ask yourself on the front end of application development to save yourself some time when it comes to scale. By discussing lifetime, replication, and write frequency, we will anticipate tradeoffs between them and help you find the right starting point.

1. Lifetime: Is it okay if this data expires?

Why start with data lifetime? Of all the criteria to discuss, 30 seconds of thinking about data guarantees eliminates the largest number of possible solutions.

Yes - Data is transient

Transient data stores like caches are useful for temporary data. It’s often cheaper to store temporary data than data to which you need guaranteed access. Because you only pay for the storage you need when you need it.

A transient store might be useful when an application can survive when data is not available. For example, this is normal for asset caching in CDN. A high cache hit ratio (CHR) is preferred but there are ways to deal with misses and repopulate the cache.

Recommendation: Start by looking at caching solutions to avoid overbuying. Transient solutions are often less expensive than alternatives and normally are less restrictive in terms of file size and write frequency. For example, the Fastly cache has no maximum file size, while Edge Dictionaries have both a maximum item count and value length.

No - Data needs to persist

This data is guaranteed to be present in the data store until it is explicitly deleted. This may be the only copy of a piece of data, or it may be easier to store it at the edge rather than setting up replication from a central store. Common examples of durable store uses are large object stores or storing a configuration file that multiple edge functions will use to make a decision, like an IP block list.

Recommendation: Start by looking at configuration solutions or object stores to guarantee the presence of the stored data. These will often provide familiar key/value interaction models. Choose the solution that best fits the required object size and regional availability, paying special attention to any write frequency caps.

2. Replication: Where is this data needed?

Where will this data be made available? Will it be accessed globally, or is it only needed in the same POP where it was written to? Combined with durability, answering the replication question can help narrow down suitable solutions.

Global

Regardless of write location, future processes around the world will need access to this data. Caching assets for global access or updating a global set of WAF rules are good examples of edge state applications that require global replication.

Recommendation: For ephemeral stores, a global cache is very likely the best fit, but make sure you have explicit cache control and get high CHRs. For durable stores, there will be a few options based on usage. Start by trying an object store and pay attention to any write frequency caps.

Local

This data is only expected to be accessed from where it was written. With local replication, an edge state store can act like shared memory for edge functions and create geo-specific interactions. Storing customized content for later use in a specific geo is a good example of local replication.

Recommendation: Again, caches are a good first option if the interaction model fits due to their high performance. If durability is critical, locale-specific storage options are useful to investigate.

3. Write frequency: How often will this data be updated?

Edge state solutions often present durability, replication, and write frequency tradeoffs. If durable storage is needed on a global scale with fast replication speeds, it’s likely to come at the cost of reduced write frequency. It’s expensive and takes a while to update all those global stores! Thus, a worthy question is whether you need data updated with high or low frequency.

High frequency

It’s necessary to write multiple times per second to a specific entry in the store. High-frequency writes are powerful and can be used for security applications and novel shared web experiences like shared game states.

Recommendation: For ephemeral and local state applications, start with the option with the most convenient interaction model (explicit cache or a key/value model). If global replication is necessary, expect to make some trade offs here or perhaps combine multiple solutions in a layered approach that accomplishes the combination of attributes needed.

Low frequency

Each asset or key in the data store is not updated frequently. A good baseline for low frequency is up to one write per second to any given key. Low frequency updates align well with configuration uses like large lists of URL redirects or WAF rules as well as large asset storage.

Recommendation: Low write frequency allows more flexibility in choosing a solution. Start with the recommended solution that matches the required durability (cache or object store) but be aware: it’s much more likely for an application to grow to require more writes than fewer!

In summary

With a little up-front thought, edge-first application design can be quite straightforward and save a lot of time and energy in failed POC implementations. There are clearly more than just three questions to answer (file size or replication timing are two examples), but the three questions we covered are effective in pointing a design in the right direction. Hopefully, they’ll save you some time and some headaches as edge-native state products emerge and solutions become more widespread.