Customer case study 14:12

Scaling smarter through automated traffic control and cacheable experimentation

Presented by
speaker avatar

Dom Charlesworth

Technical Lead, RVU


Tech company RVU is about as nimble as they come. With brands like Uswitch, Money, and Bankrate in their portfolio, RVU developers are rewriting the rules when it comes to building efficiently and strategically at the edge. Two such projects are Baton Pass, an automated traffic routing solution, and Chansey, which uses Fastly restarts and vary headers to build cacheable experimentation directly into its infrastructure.

The tech behind the tech

See why SaaS and tech companies choose Fastly to build and secure high-performance, high-scale experiences.

Video Transcript

Joshua Croad (00:08):

Hi, everyone. I'm Josh, and I'm here with Dom. We're both engineers at RVU, a company with a portfolio of price comparison websites, like Uswitch, to help our users manage their utilities, such as broadband and energy. And we're here to showcase two pieces of software we've built to enable automated traffic control and cacheable experimentation, briefly walking through their implementation.

Joshua Croad (00:31):

Okay, so this is a brief look at where we'll get to by the end. We will walk you through the individual pieces and how they fit together, in two distinct steps. So first off, I'm going to start by introducing a piece of software we built, called Chansey, which is our experimentation platform. We have three main goals we aimed for when building Chansey. The first, we wanted to combine lessons and tools that RVU have picked up over the years. Secondly, we wanted to roll out experimentation site-wide, and agnostic to implementation details. And thirdly, we wanted to bring this to the edge, because we know our businesses operate best with low latency solutions.

Joshua Croad (01:12):

So what do we want you to take away from this? Here are some of the benefits we've seen from our solution that brings us closer to achieving those goals. A lot of our content is static. Just because we're running an experiment, doesn't mean we should have to sacrifice the benefits of cashing. It also provides a common mechanism for varying users. At RVU, we value building abstractions when solving problems, where possible. In this case, we use the opportunity to build an experimentation platform that included a shared interface, well-defined organization-wide schemes for experimentation, and the platform leaves the implementation of the variant to our upstream services.

Joshua Croad (01:53):

And one of the main benefits of rolling things out at the edge has been the low latency of implementation. A lot of our businesses are driven through organic search, and latency is a big deal to our bottom line. If the circles on the left represent backend microservices, here are some implementations we've had in the past. We've used tools like Optimizely and Split IO. We've used LaunchDarkly to segment audiences and even implemented our own randomization algorithms. Math.random might seem like a joke here, but we've definitely done it.

Joshua Croad (02:24):

So in our history, there has been no standardization across our services, making it hard to share code. And integrating with SDKs causes vendor lock-in and technical debt. And over time, these implementations change as the business requirements change. So this has led us to building Chansey, abstracting away from those implementations, offering a single interface, so clients don't need to change. It means we can switch between implementations of bucketing if needed as an example, we could switch between our internal frequent test implementation and Google's basion implementation in one place for the entire organization.

Joshua Croad (02:57):

So let's introduce some more before walking through a couple of example requests. Currently we're using LaunchDarkly for our cohorting. We have a bunch of backend microservices written in multiple languages. And finally, the reason we're all here, our Fastly service. I've included some BCR sub routines to help stitch together the example requests. Here we have a basic get request for the route handled by the receive block. Now every request to our platform goes through two phases. The first phase is what we call a Chansey Phase. We rewrite this request and instantly pass it, storing the original request to play back later during phase two. Chansey then gets requested which evaluates all of our current tests and produces an experience for this particular user. In our case, the experience is at JWT, but it could be a simple streaming coding or even plain text.

Joshua Croad (03:52):

The fetch subroutine then reverts to the original request and triggers a feature in Fastly called a restart. So if restarts are a new concept for you, they simply allow us to replay the request, but allow us to extend it as well. As you can see in this example, in this case, we've added an experience header to that original request. Currently we have no cache, so we hit a cache miss. So our backend services are called. The thing to note here is that we pass that same experience header to our backends. This header provides the backends enough information to statelessly make decisions on what experience to provide the user.

Joshua Croad (04:32):

So here's an example of what that experience payload might look like. In our case, we have a adjacent structure where each experiment key contains the cohort that that user is bucketed in. So often the backend is chosen in experience, the request continues. Fastly then adds a cash entry for this request, which includes that experience header and the response is then delivered back to the user.

Joshua Croad (04:57):

So here's the second request for the same content. Phase one, like before hits Chansey and restarts. And then phase two has that important experience header. And as you can see the experience value in this example, ABC is the same as the previous request. So this user should receive the same experience. Now, this is really important because we vary on that header. Because the previous request was cached we get a cache hit. So this second user has received a cached version of the same experiment or experience. We've essentially demonstrated how we have implemented multi-variant caching.

Joshua Croad (05:37):

So I want to briefly talk through how the cache is built up. Represented here is [Yammel 00:05:42]. So each cache entry is now multi-dimensional using both the path and the experience header as a compound key. Here's an example of the same path being cached against a different experience. As you can see, the content in each entry is slightly different. The users in experience A, receiving some energy content, and experience B at the bottom, receiving broadband content on the same page.

Joshua Croad (06:11):

Okay, so let's recap. So with Chansey in the critical path of all our requests, every user that comes to our platform is provided a standardized experience, which our backends can read and use. And the most exciting part of all of this, we've configured Fastly to understand how to cash each of these unique experiences. Now this picture isn't without its drawbacks and considerations, but it's taken us a step closer to achieving those original goals. Okay, so we've covered 50% of our architecture. Now I'll pass you over to Dom to introduce you to [Baton Pass 00:06:46].

Dom Charlesworth (06:46):

Cheers, Josh. Yep. I'm here to talk about the second tool we built called Baton Pass, which handles rooting for our micro-frontends and cash-in validation. But to start, we want to go over the business motivation that led us to this. Well, first we run a huge micro-frontend architecture with content editors, publishing pages, as well as SEO building redirects, new navigation structures, all resulting in changing the routine to our services. With all of these inputs to our routing, we want to make sure we preserve a healthy cache at the edge. And lastly, we wanted this to happen automatically. Meaning no Apache config, no releases, no code changes for our engineers.

Dom Charlesworth (07:28):

Now I'm going to walk you through what we built and how it fits into our request lifecycle. Let's begin with Baton Pass itself written in Go and running a sidecar Envoy Proxy with three main parts, which we will call the Baton Pass Envoy Controller and Invalidator for hopefully obvious reasons as we progress. All of these play a key role in managing our micro-frontend routing. Controller has four key components, a web hook server, an interface for handling IO, a database to store snapshots of config, and a GRPC server to sync this config to Envoy. Just as an aside, Envoy can also be updated by polling or monitoring fast system changes. But we went with GRPC because it was the coolest.

Dom Charlesworth (08:17):

So the web hook server is used for most of our event driving. In this example, we receive events from Contentful and the IO reads files that we use to store statically defined routes from SEO and static clusters. We also have two upstream clusters that render our content, the CMS service and our main web app. And lastly, in front of all of this is our Fastly service.

Dom Charlesworth (08:43):

So now we have all the components. The first thing that happens is warming or pre-configuring the Envoy Proxy. We start by reading our static files, which will partly configure the proxy snapshot. Next, we read our URL structure from Contentful and this finishes configuring the proxy and the snapshot is up to date. Its config then gets sent via GRPC to Envoy to update the config on the actual running instance, meaning we're now ready to serve traffic to our backends.

Dom Charlesworth (09:17):

So let's start with a request for [slash guides 00:09:18], which goes into the receive block, through to [miss 00:09:23] and into our Fastly backend, which Baton Pass Envoy, which can route to either the web app or the CMS service depending on the configuration. But for this, let's say it goes to the web app, which doesn't know how to handle Slash guides. So instead it [404s 00:09:42] and gets sent back to the fetch block, which is delivered to the user. One thing to note here is we don't cache our 404s, which is important in making the next part work. Because now let's say that someone publishes slash guides within Contentful, which goes through Baton Pass Controller and reconfigures Envoy to send any requests matching slash guides to our CMS service.

Dom Charlesworth (10:09):

So now on the next request, the same thing happens. But this time Baton Pass Envoy will send the request to our CMS service returning our newly published page to the user. Brilliant. Now we're getting plenty of hits on our exciting new guide page resulting in a healthy cache and lots of hits in Fastly. But what happens when someone updates a piece of content on that page, for example, a heading with entity ID ABCD? This change is reflected in our CMS service, but Fastly isn't interacting with our backend infrastructure anymore. That's where the Baton Pass Cache Invalidator comes in. We can reuse the same event-driven architecture we use for the Baton Pass Controller to drive our cache invalidation, sending the entity ID of the content update via web hook.

Dom Charlesworth (11:05):

The invalidator will then receive this and use it to purge all entries in the cache that include that content. The way we do this is using surrogate keys, which I'll get onto in more detail in a second. But this results in the next request, going back to Envoy and receiving the fresh updated content for the user. So onto surrogates, these are the magic feature we use to tell Fastly to cache things at edge. And what's cool is we define a different cache for edge than what we use in the browser. In this response, we cache in the browser for an hour, but in Fastly for a whole day. But even better, we use the surrogate keys to build up a mapping between the IDs themselves and the cache entries that include that key.

Dom Charlesworth (11:55):

We use this by having each piece of content return the surrogate key headers with all the IDs for the modular piece of the content and all of the products that are featured on that page. This naturally builds up a map in cache of what content needs to be invalidated when it's changed. For example, here, we have a representative cache populated by two pages, The Bulb and British Gas Deals. We have two entries in this cache for these pages, with cache IDs. And then we have three surrogate keys, one, which is the content entity for the savings figure and two, which are the actual product IDs from our feed. Then we have a list of cache IDs that they map to.

Dom Charlesworth (12:45):

So when some of the product data changes and we trigger a purge, we can use that surrogate key product ID bulb to find which cache ID we should remove. Then the next time we get a request for the page, it will populate the cache in the same way, but with updated copy. What's even more powerful is that surrogate keys can map to multiple cache entries. So in this example, we have this average savings figure of £300, which we want to update across the entire site. So when it is updated, we trigger a purge and both pages are removed from the cache before being repopulated with the updated savings figure. This means we can cache extremely heavily at edge and only go to upstreams when the content has changed.

Dom Charlesworth (13:36):

And there you have it, an overview of two of our latest services we built to help us experiment and bring our content closer to the edge. We hope you've enjoyed this talk and even more, hopefully you've managed to take something away from it. For the shameless recruitment plug, RVU are hiring. So if anything we mentioned today is of interest, please get in touch via these emails or via Twitter. It's been a really great event and just want to thank all the other speakers. Take care.