Building the WAF test harness

Security Researcher

March 08, 2018

To help our customers secure their sites and applications — while continuing to give their users reliable online experiences — we’ve built a performant, highly configurable, and comprehensive Web Application Firewall (WAF). In our last post, we shared some of the tech behind our WAF, including how we chose our ruleset and leverage our findings. In order to provide a fully comprehensive solution for securing your infrastructure, it’s critical to continuously test that solution. Because technology and threats are constantly evolving, the rulesets also need to evolve to ensure proper visibility and mitigation into emerging attacks methods.

In this post, we’ll share how we ensure a quality WAF implementation for our customers, continuously testing it using FTW, and go deeper into the findings and contributions we’ve made to the OWASP CRS community with FTW.

Fastly design objectives

Building a testing and quality assurance system is important to us because we want to ensure accuracy of our WAF based countermeasure. Fastly’s WAF implementation is distinctive in that it’s fully integrated into our Varnish caching platform, which we opted to do for a number of reasons:

Integration: leveraging our existing configuration management platform
Flexibility: supporting web attack identification for multiple protocols
Simplicity: offering fewer points of failure
Performance: integrated directly into the cache nodes

Integrating with our existing platform also enables real-time configuration changes and visibility into the Fastly WAF. We built our WAF in such a way as to easily accommodate per customer or WAF instance configurations, such as including anomaly thresholds, which is important since the risk appetite will vary from customer to customer.

This approach also gives us the flexibility to identify and mitigate threats across any protocols that our platform supports, including HTTP/HTTP2/TLS (both IPv4 and IPv6).

Reducing complexity in the customer environment reduces "bumps in the wire" — you already have enough intermediate systems, CDNs, load balancers, etc. We didn't want to introduce another one by routing traffic through a separate WAF mechanism.

Sourcing quality WAF rules

Without high-quality rules, a WAF is not very useful. For this reason Fastly currently has three sources for rules:

OWASP Core rule set (CRS)
Fastly internal rules
Private vendors

We maintain our own rules to identify attacks for critical vulnerabilities we know are being exploited in the wild. There is no anomaly threshold increment in this case: if these requests match, they are dropped. The Fastly WAF also incorporates rules being written and maintained by our commercial partner (Trustwave).

Engineering the toolchain to integrate ModSecurity rules

In our previous post, we covered the design and implementation of the Fasty WAF, discussing how we selected and integrated the OWASP CRS as a core component of the service. The functionality required to support these rules has been integrated directly into our edge cloud platform.

We had to create a ModSecurity rule parser and VCL generation tool chain to translate the ModSecurity rule (SecRule format) into the VCL code block mapping the ModSecurity transform functions into our Varnish runtime equivalents. Consider the following OWASP CRS rule, which checks to see if the site uses UTF-8, and wants validation of the encoded data:

#
# Check UTF encoding
# We only want to apply this check if UTF-8 encoding is actually used by the site, otherwise
# it will result in false positives.
#
# -=[ Rule Logic ]=-
# This chained rule first checks to see if the admin has set the TX:CRS_VALIDATE_UTF8_ENCODING
# variable in the crs-setup.conf file.
#
SecRule TX:CRS_VALIDATE_UTF8_ENCODING "@eq 1" \
  "phase:request,\
   rev:'2',\
   ver:'OWASP_CRS/3.0.0',\
   maturity:'6',\
   accuracy:'8',\
   t:none,\
   msg:'UTF8 Encoding Abuse Attack Attempt',\
   id:920250,\
   tag:'application-multi',\
   tag:'language-multi',\
   tag:'platform-multi',\
   tag:'attack-protocol',\
   tag:'OWASP_CRS/PROTOCOL_VIOLATION/EVASION',\
   severity:'WARNING',\
   chain"
   SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES "@validateUtf8Encoding" \
     "setvar:'tx.msg=%{rule.msg}',\
      setvar:tx.anomaly_score=+%{tx.warning_anomaly_score},\
    setvar:tx.%{rule.id}-OWASP_CRS/PROTOCOL_VIOLATION/EVASION-%{matched_var_name}=%{matched_var}"

Note: We’ve simplified the post translation VCL below to show pertinent features (e.g., the use of variables to count anomaly scores and set thresholds). Actual VCL contains significantly more detail not shown, as such this VCL will not run.

[..]
  if (!waf.validateUtf8Encoding(arg_key) ||
      !waf.validateUtf8Encoding(arg_val)) {
    set waf.anomaly_score += ...;
    set waf.rule_id = 920250;
    [..]
    set waf.severity = 4;
  }
  return (...);
}
[..]

There were a number of different considerations we had to make while implementing this code to ensure compatibility with the OWASP core rule set (CRS) and ModSecurity in general:

Transforms: ModSecurity has a number of transforms which allow rule writers to normalize request data, which reduces the number of rule variants. For example, you can use t:urlDecodeUni to handle URL decoding so you don't need a rule for encoded and unencoded forms of a URL. Take a look here for details on the ModSecurity transforms.
Collections and variables: ModSecurity represents certain components of the HTTP request as iterable collections (essentially a list). For example, there could be a number of headers in a request. There would be a key/value pair for each header in the request, and the list of key/value pairs represents a collection. ModSecurity would iterate over every item in the collection and perform the comparisons.
Rule chaining: ModSecurity lets you chain rules together effectively, allowing rule writers to implement logical "AND" operations enabling the creation of more complex rules.

There were a number of moving parts to consider when designing our WAF implementation, and we needed to make sure that the code written to translate SecRule formatted rules into VCL functioned correctly. Otherwise there would be risk of introducing WAF-evasion related weaknesses into the implementation.

For example, having a transform that is not identical in implementation could result in the transformed data being different, and hence not matching a specific pattern. It became obvious that we needed a testing framework — something that would run continuously as we updated rules, made changes to the tool chain, etc. We also needed it to enable engineering and security operations to write code and rules rapidly while managing the risk associated with unwanted bugs and regressions.

A framework for testing WAFs (FTW)

Zack Allen and Chaim Sanders (current maintainers of the CRS-support GitHub organization) started working on the testing framework in collaboration with the team here at Fastly. This framework is written in Python, and you can use it as a Python module embedded into your own code. Additionally, you can use it as a standalone using the py.test Python testing framework.

At its core, FTW loads a YAML specification of an HTTP request (containing attack payloads or other elements of web application attacks) and translates it into an HTTP request. You can create entire test corpuses to ensure that the rules are detecting attacks.

Although there are many tools that allow you to construct requests, having to wrap them in shell scripts and manually parse requests is not a very elegant solution. Furthermore, many of these tools don’t allow you to construct non-compliant requests which is required to test certain rules, such as protocol-based attacks.

To control the details associated with the requests, we needed a framework that gave us granular control over the requests themselves. It was also important that we had flexibility when it came to interpreting success and failure in explicit ways, which allows us to avoid the use of tools external to the framework to process response data.

Here’s some simplified descriptions for some of the YAML fields used in a test:

Meta: Metadata associated with the test, which could include author, test description, an indicator as to whether the rule is enabled, etc.
Tests: Each test has a title, and optional descriptions followed by one or more stages. This rule has a single stage, but you might want to define more if you want to move the application to a certain state before delivering a payload.
Input: Where you will define most of your attack payloads, either in the form of URIs, headers, or POST body content.
Output: Use this for checking whether a specific attack payload was detected. In this example, we checked a log file for the presence of the pattern: id "942120".

Here is an example FTW test from the OWASP CRS regressions repository:

---
  meta:
    author: "Christian S.J. Peron"
    description: None
    enabled: true
    name: 942120.yaml
  tests:
  - 
    test_title: 942120-1
    desc: "Injection of a SQL operator"
    stages:
    - 
      stage:
        input:
          dest_addr: 127.0.0.1
          headers:
            Host: localhost
          method: GET
          port: 80
          uri: "/?var=blahblah&var2=LIKE%20NULL" 
          version: HTTP/1.0
        output:
          log_contains: id "942120"

Note: there are other output specifications, for example if the HTTP server returns a 400 error code, you could have the test instruct the framework to validate error code returned by the web daemon.

This is useful if your WAF is in blocking mode, and you want to confirm that a specific attack payload was identified and dropped. Once we use the test configuration in our test harness, it produces a request on the wire that looks like this:

20:36:56.956718 IP localhost.56762 > localhost.http: Flags [P.], seq 1:104, ack 1, win 342, options [nop,nop,TS val 1066948 ecr 1066948], length 103
E.....@.@.*............Pj.Vmg......V.......
..G...G.GET /?var=blahblah&var2=LIKE%20NULL HTTP/1.0
Host: localhost
X-Request-Id: 1503088616.96_942120-1

Starting with a baseline

Since the OWASP CRS was designed to run on the ModSecurity WAF implementation, we decided to use ModSecurity as the basis for our comparison. Our CRS-related tests are sourced from the OWASP-CRS regressions project.

The OWASP CRS was designed primarily to operate with Apache and ModSecurity, though there’s been work done to include support for other HTTP servers like Nginx. In all cases, these are full HTTP server implementations, which serves a different function than Varnish, the HTTP accelerator on which Fastly is built.

Since these are HTTP servers, they operate a bit differently than Varnish. For example, requests with invalid Host: headers wouldn’t be processed by the Fastly WAF since the Host header acts as a key to route the request to the appropriate service (which may not have WAF activated). Therefore failing a CI job for rules designed to detect malformed or invalid Host headers wouldn’t be correct in our context.

In other cases, the regression tests which ship with the CRS may include tests which were designed for older variants of a rule, and changes to rules could result in these tests failing. Therefore, we needed a way to create a baseline which represented which tests were expected to fail vs. not.

We created a Chef cookbook/Vagrant file to automate the provisioning and configuration of Apache, ModSecurity, and the OWASP CRS version 3.0.2:

#
# apache2+mod_security vagrant box
#
Vagrant.configure(2) do |config|
  config.ssh.forward_agent = true
  config.vm.define 'modsec0' do |modsec_conf|
    modsec_conf.vm.box = 'ubuntu/trusty64'
    modsec_conf.berkshelf.enabled = true
    modsec_conf.berkshelf.berksfile_path = './Berksfile'
    modsec_conf.vm.network 'private_network', ip: '192.168.50.75'
    modsec_conf.vm.provider 'virtualbox' do |v|
      v.memory = 512
      v.cpus = 2
    end
    modsec_conf.vm.provision :chef_solo do |chef|
      chef.add_recipe('waf_testbed::default')
    end
  end
end

You can find Fastly's waf_testbed cookbook on GitHub here.

Engineering the test harness with FTW

To ensure the quality of rule translation, we require tests for every rule, and in some cases, multiple checks to verify that rules with multiple collections have been translated into VCL correctly. There was also another challenge: we needed to make sure that the rule we wanted to test is the rule that gets tripped on the WAF; simply checking for HTTP error codes was not sufficient, as a request may have been dropped by a rule not relevant for the test.

For example, the PHP object injection test triggers dozens of rules, but we needed assurance that the rule that was written to identify object-injection detected the payload. Fastly uses the X-Request-Id, a timestamp along with the rule (and test ID) that the payload is designed to trigger. We keep track of a number of attributes to ensure that we can link a test instance to a specific WAF log entry:

Timestamp of when a request was sent
Rule ID we’re testing
Test ID within the rule we used
Whether or not the test was successful (operationally — for example, did we see a TCP reset?)
HTTP response code

We used FTW as a module for our testing harness, but we first needed to define a function that loads in the YAML configurations using code we took from some FTW utility functions. The first argument of the get_rulesets function (taken fom an FTW example) is the directory/path containing the YAML files, and the second is a flag indicating whether or not we should recurse (useful if you have nested directories also containing tests). Ultimately, you need to return a list of ruleset.Ruleset objects, as that is what the FTW test runner will require to execute the tests:

def get_rulesets(ruledir, recurse):
    """
    List of ruleset objects extracted from the yaml directory
    """
    yaml_files = []
    if os.path.isdir(ruledir) and recurse != 0:
        for root, dirs, files in os.walk(ruledir):
            for name in files:
                filename, file_extension = os.path.splitext(name)
                if file_extension == '.yaml':
                    yaml_files.append(os.path.join(root, name))
    if os.path.isdir(ruledir):
        yaml_files = util.get_files(ruledir, 'yaml')
    elif os.path.isfile(ruledir):
        yaml_files = [ruledir]
    extracted_files = util.extract_yaml(yaml_files)
    rulesets = []
    for extracted_yaml in extracted_files:
        rulesets.append(ruleset.Ruleset(extracted_yaml))
    return rulesets

While not included in these snippets, the Fastly testing harness has a few additional operations that occur while running in CI:

A way to control the scope of the tests in a very granular way, which is represented as a configuration file.
A mechanism which identifies tests that we expect to fail, which prevents failed FTW tests from triggering fatal CI failures.

The following snippet demonstrates how to run the tests once you have the ruleset objects:

testfiles = get_rulesets(co.rule_path, co.recurse)
for tf in testfiles:
    for test in tf.tests:
        ruleid = test.test_title.split("-")[0]
        now = time.time()
        # Get some unique tag to associated with the test
        # We use seconds since UNIX epoch, rule ID and test ID
        # For example: 1503088616.96_942120-1
        logtag = get_some_tag(now, test.test_title)
        runner = testrunner.TestRunner()
        for stage in test.stages:
            odict = stage.output.output_dict
            headers = stage.input.headers
            if not "X-Request-Id" in headers.keys():
                stage.input.headers["X-Request-Id"] = logtag
            try:
                hua = http.HttpUA()
            except:
                print "failed to initialize UA object"
                sys.exit(1)
            try:
                runner.run_stage(stage, None, hua)
            except Exception as e:
                # handle exception(s)

Continuous testing

We manage all the rules utilized by the Fastly WAF implementation with the git revision control system (on GitHub), and we’ve set up continuous testing for any changes to our rule sets. As a byproduct of our our CI job setup, we can also identify any regressions in the tool chain or in the WAF code within the Varnish caching engines themselves. An engineer creates a branch of the rule repository, and makes a change, such as fixing an evasion vulnerability, performance optimization, etc. The engineer creates a pull request, after which everything is automated.

The process looks like this:

CI jobs trigger.
Container gets launched.
Caching engine is provisioned into the container.
VCL tool chain is checked out.
Regular expressions are extracted from the rule sets and checked for regular expression denial of service conditions (reDoS).
An "origin" server (basically a dumb HTTP responder which responds unconditionally with 200 and a cache control configuration for Varnish, so that Varnish won't cache anything) is provisioned and local logging integration is configured.
Rule sets are translated into VCL, and for the purposes of our CI job, augmented with some logging and origin configurations. In addition, Varnish is configured to log the X-Request-Id header along with the WAF data.
The WAF VCL is compiled and loaded into Varnish.
The Fastly WAF CI code is launched, which runs through our test corpus (including configured tests in the OWASP CRS regressions).
Varnish logs are cross referenced with the Fastly CI journal to identify which rules successfully identified the payload vs. which ones failed.

Findings and contributions

As a result of this work, we’ve found several bugs that we’ve reported back to the community. Identification of the bugs themselves are usually a combination of failed FTW tests and research performed by security and engineering teams at Fastly, where we use FTW for the following operations:

Identifying bugs/validating fixes in the rules themselves (all of which we report upstream).
Finding bugs in our VCL tool chain.
Identifying discrepancies between ModSecurity and Fastly WAF transform operations.
Verifying and demonstrating specific vulnerabilities and vulnerability patches.

OWASP rule findings and contributions

It's vital that the rules are functioning as designed — we work with the OWASP CRS/CRS regressions community to ensure that problems we identify are communicated back to the projects and corrected. Most of the issues we find could lead to evasions, while others could impede false negative/positive analysis.

Here’s a summary of some of the issues we’ve seen:

Missing transforms: We observed test failures for a number of rules. Certain rule authors developed rules under the assumption that certain decoding operations had occurred. For example, Apache and Nginx will automatically perform some decoding operations prior to entering into the ModSecurity code. So in some cases, the rules didn’t specify certain decode operations assuming they have been completed already. In other cases, data might be double decoded, since the rule specifies the decode operation, but it has already been decoded by the underlying platform. Discussion and changes here: PR 578 & PR 590.
Transforms occurring on incorrect rules within a chain. A tolower operation was performed on a collection or variable that is used by the second rule in the chain, but the second rule in the chain was the one doing the comparison against a pattern and was not performing the tolower operation. PR 804
Case insensitive rules: a rule looking for http:// but not matching HTTP:// for remote file include (RFI) attacks. PR 726
Rule evasion due to missing URL schemes. For example, we specified ftp:// and http:// but file:// wasn't specified when it should have been. PR 726
Missing logdata and msg attributes, which made false positive investigations very difficult, and most of which manifested themselves as failed FTW tests. PR 798
Session fixation detection by-pass to the community. Specifically, incorrect operators being used in chained rule sets which resulted in an evasion vulnerability. PR 480

FTW helped us find some issues in our toolchain, and have been generalized below:

Incorrect sorting or reordering of transform specifications in a rule.
Bugs where transforms were being applied incorrectly on a rule chain.
Re-ordering rules in such a way the execution graph was different than what was specified in the ruleset, e.g: skip-to rules. For more information, have a look at the documentation for the SkipAfter action.
Lexical parsing of the rules: patterns being incomplete or incorrect, which resulted in only certain portions of patterns or headers being included in the VCL version of the rule.

Going forward

This article wouldn’t have been possible without contributions from our own Eric Hodel and Federico Schwindt, who identified many of the WAF evasions described above (see the PRs for specific details). Fastly’s FTW repository is located here: https://github.com/fastly/ftw — feel free to check out the code and contribute by submitting PRs. As a commercial contributor to the CRS, we also look forward to future collaboration with the CRS-support organization on GitHub which has a community maintained version of FTW and CRS-related utilities.

As we improve our testing corpus and take up new versions of CRS, we’ll continue to find new bugs and issues with rules — that’s simply a (welcome) byproduct of continuous testing. CRS is at version 3.0.2 now; however we will continue to port future releases of the ruleset, run it through our testing corpus, and find potential problems which we’ll report back up the chain in our ongoing effort to build tools and research methods to bypass our WAF, ultimately making our platform more secure for our customers. Stay tuned — in our next post, we’ll take a look at tool we’ve developed that mutates attack payloads to bypass WAF controls.