Enable API Caching

For years we've been advocating best practices in API design to enable API responses to be cached at the edge (read the 3-part series here). Too often, API platforms are written off as uncacheable in their entirety, which means you lose out on some of the easiest performance and reliability wins you could get with an edge network. In this post we’ll look at some specific use cases across publishing, e-commerce, and travel and hospitality, and how optimizing API design can improve cache performance.

4 Ways to Improve Cache Performance with API

There's nothing inherently special about an API request, and HTTP is designed to facilitate caching. But as easy as it is to cache HTTP traffic, some of the most common API design patterns actually make caching much harder.  Let's look at some of them:

  1. Using a batching wrapper format to bundle multiple API calls into one request: Doing this makes the response to that HTTP request very hard to re-use. HTTP/2 and the emerging QUIC standard mean there's no longer any overhead to using individual HTTP requests for each API call.

  2. Using a POST or other non-read method for read requests: Sometimes by deliberate design choice, all calls to an API are required to be POST, which makes it hard for intermediary caches, such as Fastly, to determine whether a response can be cached.

  3. Including requestor-specific detail in the response body — such as echoing back your API key or account number or details of your rate limit allowance remaining — makes the response too personalised to be cached.

  4. Returning errors wrapped in a “200 OK” response: If a request is invalid, returning an HTTP response implying that the request was valid might make it harder to cache valid responses.

Ideally, an API request is RESTful, i.e. it makes use of HTTP semantics. Read requests are made using the GET method, authentication credentials are included via a header, and reads are broken down into small, atomic chunks.

Dicing, slicing, and the beauty of URLs

In our original series on API caching, we considered a comment API, which had URLs such as:

GET /commentsList all comments
GET /comments/:idGet a specified comment
POST /commentsCreate a comment
PUT /comments/:idUpdate a comment
DELETE /comments/:idDelete a comment
GET /articles/:id/commentsList comments on an article
GET /users/:id/commentsList comments made by a specified user

Let's look at what's good about this API design:

  • Read requests are clearly identified by the GET method, and

  • Rather than providing lots of query parameter options on one URL, such as /comments?userid={n}, we use separate URLs for distinct dimensions of the data.

We can also consider how this might apply to other kinds of APIs. Let's think about a travel use case: a seat availability and booking system for an airline.

This is a great example because I often see endpoints such as:


This is bad news for caching, because a booking reference is usually unique to a single person and will make it hard to reuse this data. Instead, consider designing the API around units of data that apply to larger groups of users and which are composed of fewer changeable elements:

GET /bookings/:my_booking_ref See booking details
GET /flights/:flight_id/seats List seats on the flight
PUT /bookings/:my_booking_ref Update booking (eg to reserve a seat)

Now, users can retrieve their booking information, which will include information specific to them, and separately query the availability of seating on the flight, which is specific to a flight, not a single booking.

Of course, when the API receives a PUT to the booking URL asking to reserve a specific seat, the response to a subsequent query to the seating endpoint will need to get a fresh response, but we'll look at how to manage that in a moment.

Let's also look at an e-commerce example. Here, a common pitfall might be including too much information in a product list response. You probably don't want a consumer to have to read a list endpoint and also every product endpoint individually in order to build a product category page, so you might be tempted to include a lot of information in a product list response:

One problem here is that whenever anyone posts a product review of any of the products listed in the response above, it will change the data in this response — and that's likely to happen much more often than the products themselves change. So instead, we could consider multiple list endpoints, such as:

GET /categories/:id List products in category, without review data
GET /categories/:id/review-data List review data for products in this category

Now, when a review is posted, it affects the aggregate endpoint for all product reviews in a specified category but doesn't affect the product listing for that category, and the consumer still only has to make two requests, not one per product.

Authentication, filtering and paging

The ability to cache API responses is also complicated by common design requirements of APIs such as authentication, filtering and paging.

Authenticating access to APIs can be done in many ways. A query param such as ?apiKey=<your-key-here> is common but makes caching difficult unless the cache is aware of it. Submitting authentication credentials in the request body is rare and would indicate a non-RESTful API design in any case. The best option is usually a header, ideally the HTTP-standard Authorization header:

Offering a lot of filtering options makes your API more user friendly and might reduce data transfer, but it also increases granularity. It's a trade off. Try to identify key dimensions and turn them into their own endpoints, such as with the comments API example, /users/:id/comments, instead of /comments?userid=:id.

Pagination can be a pain, because if a record is added or removed early in the result set, all subsequent pages will change. A starting point is to ensure that the number of variants of a request that might be created by paging is finite. So rather than supporting offset and limit params with arbitrary values, decide on a hard-coded page size and support a page_num param instead.

/comments?offset=236&limit=130 // Bad
/comments?page_num=4           // Better!

Also, choose an appropriate page size. Generally speaking, processing requests at your origin server is more expensive than transferring more bytes than necessary, so creating fewer, larger pages will offer better value.

Finally, most requests will be for page 1 or will not have a page number, which is usually implicitly page 1. VCL can help to de-dupe these requests by considering them to be the same cache object.

Event driven purging and tagging

With all of the above, there is an assumption that responses to the GET requests are able to be cached at the edge, on the basis that when the contained content changes, you will send a purge request to Fastly. This is what we call event-driven content.

However, given the range of endpoints, pagination, filtering etc, the same piece of data may end up in lots of different API responses. This is where surrogate key tagging offers an elegant solution. Using a Surrogate-Key header, your origin server can add multiple tags to an API response:

If you were to add a product to a category, you might purge the cat-17 tag. Removing a product, you'd purge the product tag and the category tags for all the categories that the product belonged to. Just updating the data of a single product, you can purge just the product tag, knowing that any category listings that contain that product data would also change (though since the number of items in the category hasn't changed, we don't need to purge all pages of the dataset, just the one with that product in).

Stale serving

The availability of APIs can impact the availability of partners' websites and services, so it's vital that you make efforts to ensure the highest level of availability for your APIs. One way to do this is to allow your content to be served from stale cache if your origin is down (or even intentionally for search crawlers or other bots who don't need the latest data).

Using a combination of the techniques outlined here, you can build fast, reliable APIs and also reduce the cost of operation by caching more of your API content at the edge.  For more on this, remember our earlier blog series, and get in touch if you have an interesting API design challenge that you'd like us to write about.

Andrew Betts
Head of Developer Relations

6 min read

Want to continue the conversation?
Schedule time with an expert
Share this post
Andrew Betts
Head of Developer Relations

Andrew Betts is Head of Developer Relations for Fastly, where he works with developers across the world to help make the web faster, more secure, more reliable, and easier to work with. He founded a web consultancy which was ultimately acquired by the Financial Times, led the team that created the FT’s pioneering HTML5 web app, and founded the FT’s Labs division. He is also an elected member of the W3C Technical Architecture Group, a committee of nine people who guide the development of the World Wide Web.

Ready to get started?

Get in touch or create an account.