GitHub’s Joe Williams discusses mitigating security threats
Altitude 2015, our first-ever customer summit, provided a great opportunity to hear from our customers about how they use Fastly. In this post, I’ll share an in-depth overview of Joe Williams’ talk on mitigating security threats with a CDN — full video and slides are below.
Joe is a Computer Operator at GitHub, the web-based Git repository hosting service used by more than 10 million people. Joe mostly works on the load-balancing tier, and he and his team have developed techniques and built tools to deal with the various attack vectors they’ve seen. In his talk, he emphasizes that you can deploy these best practices on your own sites, in the event of a distributed denial of service (DDoS) attack.
There are number of common techniques that GitHub uses during DDoS attacks:
Traffic scrubbing. GitHub recommends selecting a provider to do this — they didn’t disclose which one they use, but there are many to choose from. They don’t leave their scrubbing provider on constantly, however, as that can cause performance issues.
Security devices at the edge, not firewalls. This gives GitHub insight into the kind of traffic that they’re receiving and allows them to mitigate some classes of threats.
Heavy use of HAProxy. They currently use HAProxy (the high-performance TCP/HTTP load balancer) in their infrastructure, and plan to use it more in the future. HAProxy supports the following:
Identify specific attacks and human requests at the edge. When an attack is targeting a specific URL, GitHub can constantly send them 403s (the HTTP status code for “forbidden”), while allowing human requests to access the content. If the browser has set a timezone or language cookie header, that request is likely from a human (versus a bot).
Checking for a full HTTP request in a TCP connection, and rejecting the connection if there hasn’t been any sort of content requested within ten seconds. This helps to mitigate attacks like slowloris, where a bad client opens a TCP connection to the webserver, but never sends a full HTTP request to ask for content — it just holds the connection open and does as little as possible for as long as possible.
Target user agents. If a specific client user agent is abusing GitHub’s service, they can have HAProxy return 403 status codes based on matching the user agent (often referred to as the UA). Sometimes you can separate specific client software using it e.g. block python-requests while allowing browsers (Firefox, IE, Safari, etc) through.
Tuned timeouts. HAProxy offers a lot of fine tuned control over various client timeouts (Joe mentions there are ~a dozen in the talk). Github uses these timeout controls to block attacker traffic that meets a certain timeout profile. The HAProxy configuration is tuned to match what their expectations of normal traffic for their infrastructure is.
Joe noted that one of the things GitHub does well when mitigating an attack is fast edge configuration changes.
GitHub has “a pretty healthy graphing culture.” They keep track of all their traffic and their providers, enabling them to monitor site health in real time. They monitor how many requests are getting “hit by the banhammer” (banning a client) — by tracking how many requests are sent to certain backends (set via HAProxy), they can look for anomalies. They keep a lot of their provider graphs in Graphite dashboards — including Fastly’s stats API.
Another part of visibility is logging different types of headers; Joe notes that “HAProxy makes this easy,” allowing them to do so regularly.
GitHub does a lot of configuration testing for their edge devices and services as well as for their providers — they have an extensive testing suite for Fastly. They test every single access control list (ACL) in the HAProxy configuration; though it took six months to set up, Joe emphasized that it was “well worth it,” as it ensures changes have the effect the want when they’re rolled out.
In the future, GitHub is going to need “better tools that identify human traffic.” Joe is looking into various traffic authorization tools, such as captchas, that they can turn on during an attack to determine whether a request is coming from a human.
One of Joe’s current projects is switching away from monolithic load balancing to a service-based load balancing tier; he wants to set up load balancing for each individual service supporting github.com, including pages and api.github.com. This will allow them to spread load balancing across more machines, giving attacks a smaller blast radius so GitHub can isolate attack traffic to a specific service.
Joe also plans on switching to more granular per-service maps (they’re currently on one map). Fastly has the ability to push GitHub’s traffic to POPs that aren’t already flooded, giving Joe’s team another tool for mitigating attacks.
Check out Joe’s talk below, and read the full summary of Altitude here. Stay tuned as we recap more Altitude 2015 talks.