You appear to be offline. Some site functionality may not work.

Waiting room

You have regular large volumes of traffic and need to limit the rate at which users can start new sessions. Once a user has been allowed in, they should retain access.

  • LEARN

    Start coding from the instructions below. Make the tests pass!

  • PLAY

    Try out a fully functional example. Just press play.

  • USE

    Add this solution to a service in your Fastly account

Illustration of pattern concept

The concept of a waiting room is similar to other kinds of rate limiting, but differs in that it is applied at a session level, and users must earn access to the site by waiting some amount of time. Waiting rooms can get complicated - especially if you want people to have a numbered position in the queue, or if you want to allow people access based on availability of a resource, such as allowing only a fixed number of baskets to enter the checkout stage at a time.

However, maintaining the global and application state required for these features is difficult to scale, and in many cases the volume of traffic is so much that centralised state management can be more trouble than it's worth. For this solution we'll show how you can create a virtual chokepoint at the edge of the network, holding back eager users and ensuring that you don't overwhelm your infrastructure.

Instructions

The waiting room principle we will demonstrate in this solution is fairly simple: a new user, arriving with no cookie, will be issued a cookie that requires them to wait a fixed amount of time. If the user makes a subsequent request after that period has elapsed, then there is a chance that the request will be forwarded to the backend. That chance is a configurable probability, and can be tuned in real time as you monitor the load on your systems. Users who wait their turn but are unsuccessful will be issued another wait token and be required to wait again.

Some things we want from such a solution are:

  • Support key rotation
  • Make it hard for users to get multiple spots in the waiting room
  • Ensure that traffic to your origin server is as smooth as possible

Let's dive in.

  1. Define some configuration

    The waiting room depends on three types of configuration, so start by creating three VCL tables: one for configuration parameters, one for signing keys and one for page content. Copy this configuration code into the init event

    INIT
    table solution_waitingroom_config {
      "enabled": "true", # Whether the waiting room is active
      "allow_period_duration": "3600", # Duration (sec) to grant access for
      "wait_period_duration": "30", # Duration (sec) client waits before being eligible for retry
      "allow_percentage": "50", # Percentage of eligible tokens to grant access
      "cookie_lifetime": "7200", # Duration (sec) for cookie lifetime
      "active_key": "key1", # Signing key to use to secure the tokens
      "log_endpoint": "my-logger" # Log endpoint configured on your service to which to send log data
    }

    Now create a second table for your signing keys. When you publish this solution to production, you'll want to use a private edge dictionary for this one.

    INIT
    table solution_waitingroom_signingkeys {
      "key1": "secret",
      "key2": "another secret"
    }

    Why would you need multiple keys? A good security strategy involves regularly rotating keys, and you might want to support the old key for a period of time after you switch to the new one, since users in the wild will be holding cookies signed with the old key. You can define as many as you want, but you need at least one.

    Finally, create a table of content for your waiting room pages:

    INIT
    table solution_waitingroom_pages {
    
      # "Sorry, you have to wait"
      "startwaiting": "U29ycnksIHlvdSBoYXZlIHRvIHdhaXQu",
    
      # "Please continue to wait"
      "keepwaiting": "UGxlYXNlIGNvbnRpbnVlIHRvIHdhaXQ=",
    
      # "Sorry, we're closed right now.  Please try again later."
      "deny": "U29ycnksIHdlJ3JlIGNsb3NlZCByaWdodCBub3cuICBQbGVhc2UgdHJ5IGFnYWluIGxhdGVyLg=="
    }

    There are three pages to set up: startwaiting, keepwaiting, and deny. You could serve these from your origin server but we find that customers prefer to host the waiting room content purely at the edge. To create a value for one of these fields, take the HTML page, and convert it to a base64-encoded form. It's also advisable to inline the resources on these pages. You're going to be serving these responses a lot, and they are the first line of defence against surges of traffic. Don't accidentally expose your origin by leaving an HTML tag in the page that references an uncached CSS or JavaScript file!

    The examples above are simple text strings, but you would likely encode a full HTML page source. If you don't have a handy way to generate Base64 encoded versions of your pages, there are lots of online services that will do it for you, like base64encode.net.

  2. Define variables

    The waiting room requires a fair few local variables, so define those first. The majority of the work in this solution will be in the recv subroutine so start by placing these declarations there:

    RECV
    declare local var.expires INTEGER;
    declare local var.decision STRING;
    declare local var.percentage INTEGER;
    declare local var.string_to_sign STRING;
    declare local var.sig STRING;
    declare local var.user_id STRING;
    declare local var.authed_user_id STRING;
    declare local var.key_id STRING;
    declare local var.seed INTEGER;
    declare local var.duration_key STRING;
    declare local var.duration INTEGER;
    declare local var.logger_name STRING;
    
    if (req.restarts == 0) {
      unset req.http.waitingroom_new_cookie;
    }

    It makes sense that any solution you install to a Fastly service, you should encapsulate into a subroutine. By doing this, you create a closed scope for any local variable definitions, so you can happily use short, convenient names for any declare local variables without worrying about clashing with other solutions. Howevever, this solution also requires one HTTP header, which will be used to persist a proposed cookie declaration from the RECV subroutine to the DELIVER one. Since HTTP headers are global, you need to take a few additional precautions:

    • Prefix the header with a namespace (waitingroom_) to avoid clashing with other solution patterns you add to your service.
    • Unset the header before you do anything with it to avoid the end user sending their own value in their request.
    • Wrap a check for req.restarts == 0 around the reset to avoid unsetting your own value if there are restarts later.

    You're now ready to start laying out the logic for the waiting room.

  3. Make sure waiting room executes in the right place

    Fastly services support a number of features that can cause subroutines to be executed more than once, such as shielding and restarts. You will want to ensure that your waiting room code only runs once. Start with this code at the end of recv, after the variable definitions that you added in the previous step:

    RECV
    # Only enable the waiting room...
    if (
      fastly.ff.visits_this_service == 0 && # on edge nodes (not shield)
      req.restarts == 0 && # on first VCL pass
      table.lookup(solution_waitingroom_config, "enabled") == "true" # if configured to run
    ) {
    
    # Remainder of this tutorial's RECV code goes here
    
    }

    In addition to checking that the request is not on a shield machine, and that it is on a first pass though the configuration (prior to any restart), you can also use this opportunity to add an on/off switch linked to the enabled key in the configuration table that you defined earlier.

    Now that you have sufficient guardrails in place, you can add the waiting room logic inside of the IF block that you just created.

  4. Set up a logger

    Waiting rooms are complex and you're probably going to want to do some logging. Fastly's log command can output anything that you can format on one line, but it requires a bit of boilerplate. You can save that into a variable, and also make use of your config data table to allow the log destination to be changed at runtime:

    RECV
      if (table.lookup(solution_waitingroom_config, "logger_name")) {
        set var.logger_prefix = "syslog " +
          req.service_id + " " +
          table.lookup(solution_waitingroom_config, "logger_name") +
          " :: [WAITINGROOM] "
        ;
      }
    
  5. Identify the user

    You will want to ensure that a user cannot share a waiting room token with their friends, and also that two tokens issued to different people at the same moment are not exactly the same (we'll explore the reasons for this later). Meanwhile, start by creating a string to represent the user. If you are already authenticating users inside of Fastly, and you're doing it before the waiting room runs, you could use the ID from that, otherwise it's a pretty good solution to use client.ip.

    RECV
      set var.authed_user_id = if (req.http.auth-user-id, req.http.auth-user-id, client.ip);

    Assuming you have a prior part of your service configuration that sets the auth-user-id header, then that can be used here, meaning that we can ensure a single user can't use multiple devices or computers to unfairly get multiple tokens for the waiting room. However, if you don't, the client IP address is good enough to prevent large-scale abuse of the system.

  6. Determine how much traffic to let in

    Your configuration table, which was defined in step 1, includes a variable allow_percentage. You'll want to look up this value and convert it from a string, which is the storage type of the table structure, to an integer, so you can use it for calculations.

    RECV
      # Determine the percentage of requests to allow though
      set var.percentage = std.atoi(table.lookup(solution_waitingroom_config, "allow_percentage", "100"));
      if (var.percentage == 0 && table.lookup(solution_waitingroom_config, "allow_percentage") != "0") {
        set var.percentage = 100;
      }
    

    Because the value in the table is a string, it might conceivably be a non-numeric value, in which case the std.atoi function will return 0. However, you probably want to 'fail open' in this situation, so to do this, you can check for a zero value and if the string source value is not "0", set the final percentage to 100.

  7. Deal with decisions that don't require a token

    Some scenarios allow us to make a decision about whether the user will be allowed in, without having to check their waiting room token. Specifically, these are when the allow percentage is 0 (ie. we are denying everyone), or 100 (ie. we are allowing everyone), or if the user doesn't have a cookie. You can define the decision that can be made into four types:

    • allow: User has waited, or doesn't need to wait, and is allowed to access the origin
    • deny: User is not allowed to access the origin, and waiting will not help
    • anon: User is not known, so should begin waiting
    • wait: User is already waiting, and should continue to wait
    RECV
      # Special case for 'deny all'
      if (var.percentage <= 0) {
        set var.decision = "deny";
    
      # Special case for 'allow all'
      } else if (var.percentage >= 100) {
        set var.decision = "allow";
    
      # Special case for if user does not have a cookie
      } else if (!req.http.Cookie:waiting_room}}) {
        set var.decision = "anon";
    
      # Validate the cookie
      } else {
    
        # ... Continue adding code from the next step here
    
      }
    

    Within the else clause, the user does have a cookie, so you can now work to validate it and make a decision based on it.

  8. Extract data from cookie

    Now you know your user possesses a waiting or allow token, you need to parse it. You can store anything you like in any format you prefer in a cookie, but Fastly VCL provides convenient functions for working with query strings, so we propose that you format your token like this:

    dec={DECISION}&exp={EXPIRY-DATE}&kid={NAME-OF-SIGNING-KEY}&uid={USER-ID}&sig={SIGNATURE}

    Assuming that the token is in that format, it can be parsed like this:

    RECV
        set var.expires = std.atoi(subfield(req.http.Cookie:waiting_room, "exp", "&"));
        set var.sig = subfield(req.http.Cookie:waiting_room, "sig", "&");
        set var.key_id = subfield(req.http.Cookie:waiting_room, "kid", "&");
        set var.user_id = subfield(req.http.Cookie:waiting_room, "uid", "&");
        set var.decision = subfield(req.http.Cookie:waiting_room,"dec","&");
    
  9. Validate the cookie data

    You're going to validate the cookie in two ways: first, to ensure that it belongs to the correct user, and second, that the signature is valid. If either of these is not true, you can reset the decision to anon, as if the user didn't have a cookie.

    RECV
        if (var.user_id != var.authed_user_id) {
          set var.decision = "anon";
          if (var.logger_prefix) {
            log var.logger_prefix + "User " + var.authed_user_id + " denied while using a token generated for user " + var.user_id;
          }
    
        } else if (table.lookup(solution_waitingroom_signingkeys, var.key_id)) {
          set var.string_to_sign = "dec=" + var.decision + "&exp=" + var.expires + "&uid=" + var.user_id + "&kid=" + var.key_id;
    
          # If cookie signature doesn't check out, treat as anon
          if (!digest.secure_is_equal(var.sig, digest.hmac_sha512(digest.base64_decode(table.lookup(solution_waitingroom_signingkeys, var.key_id)), var.string_to_sign))) {
            set var.decision = "anon";
          }
        } else {
          set var.decision = "anon";
          if (var.logger_prefix) {
            log var.logger_prefix + "Unable to check signature due to missing key " + var.key_id;
          }
        }

    Also notice that if the key that was used to sign the token is not found in your keys table, the decision is set to anon, the same as if the signature validation fails. If you mistakenly remove a key that is still being used in cookies that are in the wild, you might prefer to err towards allowing the token, but this is a security vulnerability since a user could simply change the kid to any value that isn't a recognised key name, and would then also be able to change the dec to allow! So it's important to disallow tokens that cannot be validated.

  10. Give waiting users a shot at entry

    Now you've validated the cookie, you might have a user who already has an allow decision, but if they have a wait, and the token's expiry time has passed, then it's time to give them a chance to exchange the waiting token for an allow one.

    It might seem reasonable to roll the dice at this point, but remember that, since we cannot record the fact that the token has been used, the user might simply come back and present the same token again, and since it's still after the expiry time, you would roll the dice again. They could keep doing this until you let them in, all using one single token. Instead, then, it's better to generate the randomised decision using a seed which binds the result to the input. In VCL we can achieve this with randombool_seeded.

    If you want another way to think about this: imagine you run a lottery. When you sell a ticket, that ticket has a number on it and is already destined to be a winner or a loser, even though you haven't made the draw yet. And the buyer can't change the number after they'd bought the ticket. That's what we're aiming to recreate here.

    Our randombool_seeded function takes a seed which must be an integer. The token's signature is a good source for this, since it comprises only hexadecimal characters, so you can take a short substring from it and comvert it into a number using std.strtol (if you were to try and convert the entire signature into an integer you would end up with an integer too large for our integer data type). The second argument to std.strtol is the numeric base which is 16 for hexadecimal input.

    RECV
        # If the interval has elapsed, the user has waited their turn
        # so give them a shot at getting in
        set var.seed = std.strtol(substr(var.sig,0,8),16);
        if (var.decision == "wait" && time.is_after(now, std.integer2time(var.expires))) {
          set var.decision = if (randombool_seeded(var.percentage, 100, var.seed), "allow", "anon");
        }

    So, in summary: if the user's current decision (from their valid token) is wait, and the token has reached it's expiry time, then generate a random-but-deterministic boolean based on the token's signature, which should be true approximately var.percentage percent of the time. If it comes up true, then change the user's decision to 'allow', otherwise change them to 'anon' so that they get a new token.

    It's important to give users a new token if they are not successful with the one they have, because no matter how long they wait, the token will never change to a winner.

    This step concludes the code for the if...else block that you started in Deal with decisions that don't require a token.

  11. Compose a cookie if needed

    You've now completed the logic that determines what the user's waiting room state is. You might want to log it:

    RECV
      if (var.logger_prefix) {
        log var.logger_prefix + "Waiting room state: " + var.decision;
      }

    (remember to put this code after the if...else tree that you just finished)

    Some decisions require you to manipulate the user's cookies. Specifically:

    • anon user: issue a new waiting token
    • allow user: issue an allow token
    • deny user: clear any existing cookies
    • wait user: do nothing with cookies: the user already has one and should continue waiting.
    RECV
      # Set a cookie if appropriate
      if (var.decision == "anon" || var.decision == "allow") {
        set var.duration_key = if (var.decision == "allow", "allow", "wait") + "_period_duration";
        set var.duration = std.atoi(table.lookup(solution_waitingroom_config, var.duration_key, "30"));
    
        set var.expires = std.atoi(now.sec);
        set var.expires /= var.duration;
        set var.expires *= var.duration;
        set var.expires += var.duration;
        set var.expires += var.duration;
    
        set var.key_id = table.lookup(solution_waitingroom_config, "active_key", "key1");
        set var.string_to_sign = "dec=" + if (var.decision == "allow", "allow", "wait") + "&exp=" + var.expires + "&uid=" + var.authed_user_id + "&kid=" + var.key_id;
        set var.sig = digest.hmac_sha512(digest.base64_decode(table.lookup(solution_waitingroom_signingkeys, var.key_id)), var.string_to_sign);
        set req.http.waitingroom_new_cookie = "waiting_room=" + var.string_to_sign + "&sig=" + var.sig + "; path=/; max-age=" + table.lookup(solution_waitingroom_config, "cookie_lifetime", "7200");
      } else if (var.decision == "deny") {
        set req.http.waitingroom_new_cookie = "waiting_room=deleted; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT";
      }

    The first thing you're doing here is to determine the duration for the expiry time of the token. You can use the allow_period_duration and wait_period_duration properties in your config table to set different preferred durations for these. Once looked up, convert it to an integer using std.atoi.

    The next section looks a bit odd. The intention of these manipulations of var.expires is to end up with an expiry time which falls on a known boundary between two slices of time. If a user has access to multiple devices but only one identity (eg one IP address), then they could join the waiting room on all their phones and tablets. Since the expiry time is part of the token, and part of the signature of the token, and the signature determines whether the token is a winner, the user would improve their chances by getting in the queue on multiple devices. They would get a new, different, waiting token more often than we'd intend.

    While it's a good idea to ensure that all tokens generated by the same user during a time window havre the same ultimate outcome, it's an extremely bad idea for everyone's tokens in the same time window to have the same outcome. That would result in gigantic traffic spikes to your origin, and is the reason why we include a user ID or client IP address in the token signature.

    Fortunately, you can ensure that, regardless of when the user shows up, their expiry time is always on the next boundary, and their token includes their ID. Say you set the boundaries on the minute. If a user arrives at 16:05:23 without a token, and again (another anonymous request from the same IP) at 16:05:45, then these are within the same boundary, so we set both of their tokens to expire at 16:07:00. This means they both get the same signature, and ultimately, the same decision.

    In the code above, this is done with the following steps:

    1. Set the time to now.sec, the current time as a unix timestamp (the number of seconds since January 1970). now.sec is a string, so convert it to an integer.
    2. Divide by the desired duration of your time brackets. Since the var.expires variable is an integer, the fractional part of the result is discarded.
    3. Multiply by the same number. Since the fractional part was discarded, this gives you the unix timestamp of the start of the time bracket containing the current time.
    4. Add the duration twice. The first addition takes the time to the end of the current time bracket, but that will not require the user to wait a full waiting period, so add the duration again.

    Finally, construct the cookie. Look up the signing key that is currently active, and form the string to sign, which must take exactly the same form as the one you constructed when validating the token earlier. Calculate the signature, add it to the string to sign, and that forms the value of the cookie.

    Cookies can't be set in the recv subroutine, but you can use a temporary HTTP header to store the desired cookie string, and apply it to the response later, in the deliver subroutine.

  12. Reroute non-allowed users to canned responses

    For the final part of your waiting room implementation, you must stop users whose decision is not allow from continuing to the resource that they requested.

    RECV
      # Prevent normal request routing if decision is not 'allow'
      if (var.decision == "anon") {
        error 818 "waitingroom:startwaiting";
      } else if (var.decision == "wait") {
        error 818 "waitingroom:keepwaiting";
      } else if (var.decision == "deny") {
        error 818 "waitingroom:deny";
      }

    Since you are at this point transferring control of the request to the error subroutine, the current scope is lost. You can communicate across this boundary via the HTTP status code (which is not a great idea because it may be used by another solution) or via the response text which is a string and is intended to be the status descriptor, eg "OK" or "Not found". You can use this though, to pass some information to the error subroutine.

    This is the end of the recv code. You should still have one } to close for the if statement that encloses the entire waiting room implementation, and then you're done with this subroutine!

    Now, over in error you can receive and parse the error object and convert it into a synthetic response:

    ERROR
    if (obj.status == 818 && obj.response ~ "^waitingroom:(\w+)$") {
      set obj.status = 200;
      set obj.response = "OK";
      synthetic.base64 table.lookup(solution_waitingroom_pages, re.group.1, "");
      return(deliver);
    }

    This will look up the content of the HTML page that you stored in your pages table, and use it as the body of the response to the user.

  13. Set the cookie on the response

    Whether the user was allowed or not, ultimately they will end up in the deliver subroutine, where the response can be tweaked before it is delivered to their device. This is where you need to set the cookie that you prepared in the recv subroutine earlier:

    DELIVER
    if (req.http.waitingroom_new_cookie) {
      add resp.http.set-cookie = req.http.waitingroom_new_cookie;
      set resp.http.Cache-Control = "no-store, private";
    }

    When adding a set-cookie header, it's always a good idea to use add instead of set, because the response might already have a set-cookie in it, and you probably don't want to wipe out all other cookies that would otherwise be set in this response.

    It's also a good idea to make responses uncacheable in the browser if they set cookies.

  14. Tidy up requests to origin

    The cookie that waiting room uses, and the 'new cookie' temporary header, are both properties of the request object, which means they will get copied onto the request to origin, unless you do something to prevent that. Since the cookie is used by edge logic, it's a good idea to make sure it's not also used by server-side logic too. The new-cookie header is simply being used as a way of getting a variable within Fastly that you can access in multiple subroutines, so that certainly should not be sent to origin.

    To make sure we always perform this cleanup, the code needs to be put in both the miss and pass subroutines:

    MISS / PASS
    unset bereq.http.cookie:waiting_room;
    unset bereq.http.waitingroom_new_cookie;

    And with that, you're done! Congratulations, you have a waiting room.

Next steps

This solution includes the content for the waiting state responses inline in the VCL. You could also consider some alternatives to this:

  • issue a redirect to another URL which doesn't itself apply waiting room rules.
  • change the req.url variable and set req.backend to an alternative backend such as a static object service.
  • add a header such as Waiting-Room-Status: wait to the request and then send it to origin anyway. Origin could respond with the waiting state content and a Vary: Waiting-Room-Status header to ensure that the content is not confused with the real content in the cache. However, this is likely to defeat the object of the waiting room solution (which is presumably to reduce traffic to your origin servers!

Quick install

This solution can be added directly to an existing service in a Fastly account as a set of VCL snippets. The embedded fiddle below shows the complete solution. Feel free to run it, and click the 'INSTALL' tab to customise and upload it to your service:

Click to view the fiddle code

Once you have the code in your service, you can further customise it to your needs, but if you keep it unmodified, it will be eligible for automatic upgrades if this recommended solution is improved in future.

All code provided through Build on Fastly is provided under both the BSD and MIT open source licenses.

Get in touch

Help us make this resource more useful for the entire Fastly community. Email your questions, requests, and big ideas to developers@fastly.com — or reach out and let us know what you’re working on.