Accelerating Rails, Part 2: Dynamic HTTP Caching

Part 1 of this series provides insight on the various types of caching built into Rails and their effective use. This post is all about HTTP caching, which happens (mostly) outside the context of the Rails stack.

In the second part of our series on accelerating Rails, I'll cover configuration of a few Fastly features, Varnish and Varnish Configuration Language (VCL), and strategies for caching dynamic content that are targeted towards the Rails developer. It's important to keep in mind that caching dynamic content usually isn't a one-size-fits-all solution. All Rails apps have different domain models, implementation details, and requirements, so it's not always possible to take a generalized implementation and make it work out of the box for everyone.

After reading this, I hope your key takeaway is to think more deeply about dynamic content caching and how some of the strategies outlined here can map to your particular use case.

Fastly config

Setting up your Fastly service is straightforward: provide a domain name and origin server address (i.e., the Rails server) and Fastly is immediately available to do your cache bidding. Fastly's acceleration platform is totally configurable, enabling fine-tuning of your service for your specific requirements. This section covers common configurations that work with highly dynamic Rails apps. The first and most important configuration step is to set up DNS so that traffic is routed through Fastly’s global network.

What's in a CNAME?

A CNAME makes your domain behave equivalently to another domain so that it's transparent to the client. Say you have a Rails app running at test.herokuapp.com and have the domain example.com. You probably want clients and users to access your app via the canonical name example.com instead of the domain assigned by your hosting provider. This is enabled by creating a CNAME record in your DNS provider settings.

Let’s use a command called dig, included in most Linux distros, to see what an actual DNS query looks like, and to get a better idea of how this works.

$ dig test.herokuapp.com

;; QUESTION SECTION:
;test.herokuapp.com.            IN      A

;; ANSWER SECTION:
test.herokuapp.com.     299     IN      CNAME   us-east-1-a.route.herokuapp.com.
us-east-1-a.route.herokuapp.com. 59 IN  A       23.21.43.109

The output provides some useful information about how requests to this domain are routed, namely that the herokuapp.com domain is actually pointing to something.route.herokuapp.com, which is an A record for the IPv4 address 23.21.43.109. When clients access your app on this domain, this IP address is what they actually connect to.

Creating a CNAME to Fastly is how you specify that traffic should flow through our network. We provide CNAME setup instructions for most DNS providers in this document. A correct Fastly CNAME setup will look something like the following:

$ dig www.fastly.com

;; QUESTION SECTION:
;www.fastly.com.                        IN      A

;; ANSWER SECTION:
www.fastly.com.         567     IN      CNAME   global-ssl.fastly.net.
global-ssl.fastly.net.  8       IN      CNAME   fallback.global-ssl.fastly.net.
fallback.global-ssl.fastly.net. 8 IN    A       199.27.79.184
fallback.global-ssl.fastly.net. 8 IN    A       23.235.47.184

Note that the answer comes back with a CNAME to global-ssl.fastly.net. Also note how the answer section contains multiple A records. These A records are IP addresses of the nearest Fastly POPs to where the request originated. Why two IP address? This is for redundancy in the case that one of the addresses fails to resolve. Which IP is actually selected for use is highly dependent on the implementation of the DNS resolver used. In the future, better DNS resolvers may be able to select the IP with the lowest latency.

When a client makes a request to example.com, the network routes it to the Fastly POP located at one of the IP addresses in the DNS query. When the request reaches the POP, it is either retrieved and served directly back to the client from the cache, or passed through to the origin configured in your Fastly service.

Configuring traffic to flow through Fastly yields performance benefits provided by TCP session optimizations on our network, which is meant to reduce round trips and thus reduce latency.

A note on Rails’ asset pipeline and CNAMEs

After creating a CNAME record that points to Fastly, you no longer need to provide an asset_host configuration setting as mentioned in part 1 of this blog series. Since all requests flow through Fastly’s caches, you only need to set appropriate expiry headers, like Cache-Control for the cache to pick them up.

Logging

Logging is an essential tool for tracking down bugs and running asynchronous jobs. When responses are served out of the cache, log lines won't show up in your application logs since the requests never make it to the origin. For visibility into these responses, we provide remote log streaming to a range of logging providers, including Papertrail, S3, and Splunk. If you don't use a third-party for log storage and analysis, you can set up a syslog endpoint. All of this is easily configured through the Fastly dashboard or via our Remote Logging API.

Origin Shielding

Fastly’s Origin Shield allows you to route all your requests to your origin through a designated Fastly POP. This reduces requests to your origin, since all other POPs will fetch content from the Origin Shield instead of directly from your origin.

Choose a Shield POP that minimizes latency from your origin to the POP. You can check out where all of our POPs are on our network page. For example, with a Rails app hosted on Heroku US (EC2 US East in Ashburn, VA), elect the Ashburn, VA Fastly POP as a shield. If hosting on Google Cloud Platform (GCP), electing a shield POP in San Jose or Ashburn has improved response time provided by our direct interconnects with GCP.

You can verify Origin Shielding by inspecting the headers of a requested object. For example, on the first request of an object you would see something like the following. (Note that the first cache listed is the shield node, and the second is the local edge node.)

X-Served-By: cache-ash1212, cache-sjc3333
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0

These are only a few of the features available for configuration. I've left out a few, including origin load balancing, health checks, and request collapsing. These are all documented on docs.fastly.com.

Let's move on to learning about Varnish Configuration Language (VCL), which will serve as the foundation for some interesting use cases.

Edge scripting with Varnish and VCL

Fastly is built on Varnish, an open source HTTP reverse proxy cache. The focus of this section is on VCL, a domain-specific language for interacting with requests and responses during Varnish processing. With Fastly, this means you can use VCL to modify requests at the edge cache, bringing processing closer to your clients.

There is lots of easily digestable documentation in the Varnish book that covers the Varnish state machine and VCL basics. I'll skip those explanations. Let's discuss synthetic responses, which will come in handy later when talking about caching strategies.

Synthetic responses

A synthetic response is used to return a response directly from the cache node, without performing a lookup in cache storage and without passing the request to your origin. In other words, the response is "synthetic" because it was never fetched from origin.

VCL provides the synthetic function for this purpose, which is useful in a variety of ways. For example, returning errors directly from the edge or caching “like” and “share” buttons.

Say you have a JSON API endpoint /secret-endpoint that requires a token to be present in the Authorization HTTP header of the form "My-Key:somethingSuperSecret". If the Authorization header is not present or does not contain “My-Key”, the request should be denied and a 401 Unauthorized returned.

In Rails, you would likely handle this type of header verification in an ActionController method that maps to /secret-endpoint. If you’re feeling clever, you might write a middleware to handle this to reduce some of the overhead through the entire Rails stack. That's not bad, but if the capability exists to do this processing at the edge, closer to your users, then why make the client wait for a response to come all the way from the origin?

Instead, synthetic responses enable you to respond with a 401 Unauthorized directly from the edge cache, eliminating the latency penalty of going to the origin. Check this out:

# vcl_recv is the first VCL function executed.
# full docs at https://www.varnish-software.com/static/book/VCL_Basics.html
sub vcl_recv {
  if (req.url ~ "^/secret-endpoint" && !req.http.authorization ~ "My-Key\.") {
    error 401 "Unauthorized";
  }
}

# vcl_error handles the error we returned in vcl_recv
sub vcl_error {
  if (obj.status == 401) {
    set obj.http.Content-Type = "application/json; charset=utf-8";

  # our "synthetic" json error response
    synthetic {"
      {
        "code": 401,
        "error": "Unauthorized",
        "msg": "Invalid Authorization"
      }
    "};
  }
  return(deliver);
}

This VCL is by no means an alternative to implementing proper authentication on the Rails side, and should not be assumed as such. This is simply a way to offload request processing from Rails with the purpose of reducing latency and load handled by the Rails server.

Further reading on this sort of scripting is available throughout our blog, including this rather relevant "Caching 'like' and 'share' buttons" post. You can read more about the vcl_error function in the Varnish book. I like to keep this VCL regex cheat sheet on hand, too.

No VCL? No problem.

If you don't like or are uncomfortable with VCL, Fastly doesn't hold it against you. In fact, we provide a comprehensive Conditions API and tutorials for its use, which enable configuration of similar "edge conditions." The API allows you to take advantage of edge scripting using conditions without having to learn too much about VCL or Varnish.

Now that we’ve covered VCL and Varnish, let's look at strategies for caching dynamic content.

Cache strategies

When setting up dynamic caching, you'll often need to think deeply about how pages and APIs interact with the underlying data to make appropriate caching decisions. One of the ways to boost API performance with Fastly is to actually cache API responses on the edge caches.

API edge caching

HTTP APIs are integral to everyday life, with companies implementing APIs on everything from mobile apps to satellites to refrigerators. Traditional ways of caching dynamic Rails APIs include:

Setting short TTLs in Cache-Control
A combination of built-in page/action/fragment caching
Running an HTTP accelerator like Varnish in front of Rails — if you do this, pat yourself on the back!

Short TTLs

Using a short time to live (TTL) (e.g. 30 seconds) in combination with middle-mile optimizations from CDNs like Fastly improves response times and reduces request volume to the origin. This strategy is relatively straightforward to implement, but it's unintelligent when it comes to handling unpredictable changes like user-driven updates.

Say you use a Cache-Control value of public, max-age=30 for API responses. What happens when the response changes one or two seconds after your initial response? If you answered "nothing,” you’re correct.

This creates the unfortunate side effect of forcing clients to be stuck with a now invalid (stale) response for the duration of the TTL. This causes user frustration and confusion, as they expect changes to be reflected instantaneously. These days, API response times are sub-second or even just a few hundreds of milliseconds, which enables changes to happen extremely fast. Setting arbitrarily small TTLs on the off chance that content doesn't change just won’t cut it if you have high performance needs.

Rails caching

On the other hand, using Rails built-ins like action or fragment caching for dynamic APIs can handle unpredictable content changes better, since you’ll have more control over your caching mechanisms (including the ability to expire fragments). However, you lose out on some of the network optimizations provided by the CDN as more trips to your caching layer and/or origin are required and more time is spent processing within the Rails stack.

One interesting trick that Rails fragment caching uses to disguise expiring fragments (aka purging) is special cache keys that change when an object is updated. Cache expiring is then performed by the cache's internal least recently used algorithm to decide what content should be fresh. This is actually a really clever, often misunderstood trick that gets around a deeply-rooted lack of an HTTP invalidation framework, as explained by Fastly’s VP of Technology, Hooman Beheshti, in "Leveraging your CDN to cache dynamic content."

Enter the purge

A purging system able to keep up with frequent changes in content enables more intelligent dynamic caching than short TTLs and is more intuitive and transparent than fragment caching. With Fastly, dynamic API caching is all about the power of Instant Purge.

Here's how it works:

Recall the earlier example of setting a 30 second TTL on API responses. When an update occurs (making the cached response stale), the request is forwarded to the origin. The origin performs a database write and any other processing defined in your update request handler. Our update request handler should also issue a purge request, which is sent to the cache(s). The cache acknowledges the purge by removing the invalidated response and replacing it with a fresh response from the origin.

In Rails, the update request handler is the controller action that handles that update. By convention, this would be the update method in a class inherited from ActionController.

One important piece remains: how do you specify which object to purge? One way of doing this is to perform the purge based on the resource URL of the cached object. This enables the use of the HTTP PURGE method such that curl -X PURGE http://example.com/object/to/purge would work. But what if responses are related and dependent?

In this case, purging based on resource URLs would require a purge request to be sent for every dependent object as well the top level object. That's not good news if there are thousands of dependent objects in the cache.

Rails fragment caching with key-based expiration is useful in this situation because it enables caching nested, dependent objects without having to think about it too much.

With Fastly, cache keys are specified using a Surrogate-Key HTTP header (check out this blog post about Surrogate Keys). Tagging responses with Surrogate Keys enables key-based expiration (aka purging) to happen on a single response object or group of dependent objects. Much like the expire_fragment method in ActionController, a Fastly purge API call removes the object from the cache.

With this, the process to cache dynamic content is:

Set an appropriate TTL on the response (longer is better since you control invalidation).
Tag the response with Surrogate Key(s) (happens on HTTP GETs).
Purge by Surrogate Key when the response changes (happens on HTTP POST, PUT, DELETE).

To help perform this process in Rails, Fastly-Rails was born. This gem is a Rails plugin that provides helpers and extensions to implement the above process in your Rails apps. The README in the GitHub repo provides detailed explanations and example usage.

At Fastly, we actively listen to feedback and openly review patches to provide a happy place for users. Have suggestions or feedback? Open an issue or pull request in the repository or check out our Community Forum.

You might notice a special section in the README pertaining to cookies. Big thanks to Jessie Young over at Thoughtbot for contributing this section and also writing a great guide to using Fastly with Rails. This section can be immensely helpful in debugging why objects that should be cached are not. Hint: Responses containing Set-Cookie are not cached by default.

Tracking down this issue led to the contribution of a rack middleware that removes the Set-Cookie HTTP header from responses containing a Surrogate-Key or Surrogate-Control header.

While reviewing some of the VCL I was putting together for this blog, it occurred to me after the fact that an alternative solution would be to use VCL at the edge to strip Set-Cookie. For example, something like:

sub vcl_backend_response {
  if (req.url ~ "^/thing/to/cache") {
    unset beresp.http.set-cookie;
  }
}

Check out this VCL doc on cookies for more details.

There are actually many situations where VCL could be used instead of rack middleware. One obvious use case is that of gzip compression. Instead of using the Rack::Deflater middleware mentioned in part 1 of this blog series, you could do this at the edge in VCL or through your Fastly config.

Cookies and logged-in user content

A tricky situation that manifests itself when caching HTML or JSON is handling user-specific or authenticated content. We certainly do not want to cache private data in shared public caches, but we do want to speed up delivery by serving as much as possible from the cache.

One solution to this is to use an Edge Side Include (ESI), which is somewhat similar in concept to Rails view partials. To use an ESI, replace the piece of authenticated or user-specific HTML (or JSON) with a piece of XML that looks like this:

<html>
  ...
  <esi:include src="/user/profile" />
  ...
</html>

Now that the authenticated piece of content has been removed from the rest of the page, this page can be cached like any other. When the page is requested, the cache makes a request for the user-specific content from the origin and inserts it into the page before sending it to the client.

Read this blog post on using ESI for an example of how to cache a page containing an “uncacheable” logged-in user's shopping cart. In reading the post, you'll discover that AJAX can be used as an alternative to ESI. The choice of ESI or AJAX is up to you based on your situation, requirements, and what you feel most comfortable with.

The final part of that post outlines using ESI, JSON, and synthetic responses to build user-specific JSON objects, since AJAX is not an option available for JSON content.

Base HTML caching

Single page apps are all the rage these days, but that doesn't change our user's demands for quick loading pages. It’s frustrating when you have to stare at an empty or partially loaded page for a second or two while AJAX calls finish rendering the page.

Caching the base HTML can provide a performance improvement. The quicker you get the base HTML page to the client, the sooner AJAX calls can fire and the sooner the page can finish rendering.

One way to do this is to use Rails page caching if the base HTML is rendered from a controller action (e.g. a HomepageController). This require changes to your application code, so another way to do this without modifying anything in the app is to use VCL or configure a condition and header through the Fastly API. Any session data aside, VCL for this might look something like the following:

sub vcl_fetch {
  if (req.url == '/') {
    # set client TTL after the origin responds but before it's cached
    set beresp.http.cache-control = "public, max-age=900";

    # set the time to keep in the edge cache
    set beresp.ttl = 1w;
  }
}

This looks great, except that Rails helps you prevent Cross-Site Scripting (XSS) by injecting a unique Cross-Site Request Forgery (CSRF) token into the header of the page with the csrf_meta_tag helper.

Fastly engineer James Rosen recently wrote a fantastic blog post on caching with CSRF security. It’s an insightful read for those who think base HTML caching is relevant to your needs.

Wrapping up

Admittedly, there's a lot of information to digest here.

We covered configuration of several Fastly features, including an explanation on CNAMEs and some sweet dig command examples. We dove into VCL and synthetic responses for edge-side scripting, then laid out strategies for caching dynamic Rails APIs, including Instant Purging, ESI, and AJAX. Finally, keep in mind that edge-side scripting is a natural fit for some tasks typically performed in rack middleware.

I hope you enjoyed reading, learned a thing or two, and that I've sparked your interest in extending your cache reach. If you find this interesting and would like to do this everyday, we're hiring. Happy caching!