Patrick McManus, Distinguished Engineer at Fastly, explores how the definition of secure networking is expanding and why TLS1.3 is faster, more robust, and more responsive than ever before.
Good afternoon everybody. I'm going to talk a little bit about TLS 1.3, which is really a great segue to the things Richard was talking about and feeds in nicely to the things Jana will be talking about involving QUIC right after me. So, this is a propitious time to talk about TLS and TLS 1.3 in a deep dive fashion because while 1.3 was standardized, the first revision to the standard in over a decade in the summer of 2018, the last 12 months have been really when we've seen deployment of this protocol. So, it's a very relevant thing now. Internet-wide connection rates are probably on the order of 30% now using 1.3. So, we'll talk a little bit about what it means in detail. I have two sections to this talk. One is a deep dive into what 1.3 is versus 1.2 and how it's just plain better and the second is more of a thesis.
I hope to convince you that because of the ways TLS has grown up, it is actually expanding the definition of what it means to have secure networking. It's pushing HTTPS to cover cases that it hasn't covered in the past. Okay. So, a little bit of a review. The objectives of a secure transport and in this case TLS is the secure transport. This is a two-party conversation. It is not multi-party, it's between two entities. The client and the server and the conversation they have has three major properties.
One, confidentiality. That means no one else knows what they're talking about, right? Authentication, which means both parties are sure that the other end, their peer rather, it's not being forged. And the third one is integrity, which means a third party can't introduce new data into this connection without either side being aware that happened, but here's the kicker, this has to be done without any prior arrangement. So, it's not fair if the two parties meet at a coffee shop and they exchange their secret decoder rings and then they get back on the internet and do that. Brand-new servers have to be able to talk to brand-new clients, millions of them every day, for this to happen at internet scale and be a useful thing.
Okay. So, how is TLS 1.3 better than the versions before? I'm going to argue there are three ways. It's faster, it's more robust, and it's more responsive. Faster is really all about handshake overhead. This is the time you have to wait before you can do anything useful. The useful thing in this slide is the HTTP request, sort of at the bottom of each side. So, you see 1.2 on the left and 1.3 on the right and what you should notice is there is half as much overhead in the 1.3 cases as the 1.2 case. Instead of there being two round trips of data, there's one round trip of data. And the other thing to note about 1.3 is that half that handshake overhead is encrypted where the whole handshake — both directions, two flights worth in TLS 1.2 — is actually contained in plain text. Before you get going with the HTTP part, a similar thing, but almost more important, happens for repeat visits to a website and so, repeat visits actually just means on a connection level.
It doesn't necessarily mean users across long times, different sessions. So, these can actually be just spaced a few seconds or even less apart and they actually are the bulk of all connections made on the internet are repeat visits. And using a technique called presumption, both 1.2 and 1.3 have expedited cases that makes us more efficient. In 1.2, there's one round trip of overhead and in TLS 1.3 — it's so awesome, it's got its own name — it's called 0-RTT mode and there is no overhead at all from a TLS point of view before you can send HTTP data. So, there's no waiting.
Okay. So, faster, more robust and more responsive. We'll talk about robust, which I think is actually the biggest innovation here down in the details. 20 years ago, a researcher named David Bleichenbacher — I just like saying that name — he established a series of attacks against RSA-style key exchanges, these static key exchanges that have been a staple of TLS for a long time. They're basically based on chosen ciphertext attacks and some timing side-channel attacks and these are a class of problems he really uncovered that are inherent to RSA-style key exchanges. They weren't just bugs that could go off and be fixed. Indeed all the implementations of TLS have found workarounds for the known attacks over the years. But new attacks that are really based upon the same principles keep coming up and it's a bit of a house of cards and so, one of the innovations of 1.3 is just stop doing that, okay? To no longer use static key exchanges based on RSA and instead use what are known as ephemeral Diffie Hellman key exchanges and so, now instead of having 37 cipher suites using a whole slew of algorithms that we know aren't safe beyond RSA, CBC, MD5, RC4 sort of allowing all of those things, we now only have five cipher suites. They're called the AAD cipher suites and they all use ephemeral key exchange mechanisms. Okay? The three most important ones are up there in the screen. ASGCM, most common thing you're going to see on the internet by far. It's well-suited to the hardware you find in both phones and laptops. They can help it out, okay? CCM is best for embedded environments, low-resource environments, that kind of thing. And Cha Cha Poly, pure software-based implementations that don't have any support from the hardware and it becomes really obvious now which cipher suite to use in what environment and that itself is a good way to help you prevent yourself from making mistakes.
All TLS 1.3 connections have this property called forward secrecy, which is a tremendous innovation. This is all based around the ephemeral key distribution that we talked about on the prior slide. So, in an ephemeral key distribution, there's actually a new key that's minted. It's created for every connection you have. So, no two connections share the same key that governs the connection that's going on and this has a really important property for the internet and for TLS security. It means that a compromise of the server key can only impact future connections, but not connections that have happened in the past. So, here's the threat model. You have an actor who has access to your network. Maybe your network is a radio-based network and everyone has access to it, okay? They don't have your keys, they just see encrypted data and they're shoving it in a database.
They do this for maybe years and then one day something bad happens and you do lose control of your key. They grab that key and they're like, "I know what to do with this." Then they go back to their database and gigabytes and gigabytes and terabytes of information is available to them. In an ephemeral key system, because every connection actually used a different key, the key compromise is useless for that repository of data, okay? And this is a real threat. If you read any military history, this thing happens at an abstract level all the time in our past. And so, while I'd like to say what's happened in the past stays in the past, the reason these ephemeral keys don't help you go into the future is if someone has control of your server key, they can basically become an impostor and man-in-the-middle you and you're not protected against that, but you are protected looking backward.
So, if you detect the key compromise, you're in a pretty good place. Near the bottom of the slide it says, "Resumption is better, but it's not perfect." Resumption is this repeat visitor to your website case and the way that gets rid of that extra round trip and became faster was actually last time you were connected, you exchanged some key material in that session that you used in the future session in order to speed things up and not negotiate that at handshake time, which is great for performance. From a security point of view, it creates a little bit of a liability and that's still true in 1.3, but it's less of a problem. And that's because, in 1.2, the resumption mechanisms apply to the entire length of the connection, where in 1.3 that really only applies to a much more narrow window, enough time to essentially negotiate a new key in parallel with what you're doing and then you switch over to using that. So, there's much less exposure from that point of view.
Excuse me. This slide's a little bit about certificate encryption, which is the last bit of robustness I want to talk about and the key thing to think about here is, if you remember when we showed you that half the handshake was encrypted now in TLS 1.3, the important thing I think that gets covered by that is the server hello and the certificate that is inside that. So, if you look on the left side of the screen and the TLS 1.2 server hello, you'll see down in the bottom that the user's going to use, what did they use? Lawyer.com, I believe, which tells you something about the client and what their needs are, that perhaps they consider should be part of the confidential communication they want with the server. If you look over at the 1.3 side, by encrypting this certificate information all you get is a bunch of gobbledygook. I'm sure you're all going to go check my encryption to make sure I did that right on the slide.
All right. So, I was going to try and convince you of faster, more robust and more responsive. So, responsiveness, of course, is an aspect of fast, but it's a very special one. So, I want to talk about 0-RTT because 0-RTT connections are ones without any setup overheads. So, they make life really buttery smooth when interacting with a website. You could have wandered away from it for a while and so, you touch on it and everything just jumps to life, right? It's a very important property and so, there's no handshake overhead. So, when you reduce something from say two round trips to one round trip, you've increased performance by 50%, 100% if you're into reciprocals, but when you increase something or you decrease something from one to zero, you've really created an infinite performance increase, right? You hit the zero bound and you've made it free and free changes your relationship to things.
It's not just a performance improvement. It's really a whole different feature. And so, when you have free and no setup overhead, you start to blur the line between traditional HTTP persistent connections and new connections made with 0-RTT. Neither of them have any overhead and there are problems with traditional HTTP persistent connections, right? There are CPU issues there, memory issues. Anytime you have stayed on the server, right? There's scheduling issues on laptops and phones, it interacts pretty poorly. The batteries, you have NAT issues that can create reliability issues. We've come to live with most of these things, but if your alternative is a 0-RTT connection that's just as performant, that may be a really good substitute. Now I exaggerate just a teeny bit here because if you're doing this over TCP with say HTTP2, you still have a roundtrip of TCP overhead.
I haven't talked about this as much TLS overhead and you have the TCP overhead to talk about, but if you substitute this into QUIC — which Jana is going to talk about next — and it uses TLS 1.3 directly, you really truly do have a no overhead HTTP mechanism and I think you're going to start to see this line between persistent connections and new connections really be blurred and some interesting things happen with it. Okay. So, faster and more robust, more responsive. Did I convince you of that? I hope. Any doubters? I'll find you after. It's all right.
We'll talk about adoption next. Shout out to my Fastly teams and colleagues. This is in our product suite. It's currently in limited availability. Just chat with your support rep and that's a pretty easy thing to add you to going forward and as I said on the internet, this is going really well. So, if you're wondering about is there client support? Well, you got Chrome, Safari, and Firefox — they've got you covered. You're like, "I need libraries, I need tools," Curl, OpenSSL, Boring SSL, Go Libraries, a bunch of different stuff, got you covered. Mobile's your issue, you're looking for users, iOS 12.2, Android Q, got you covered. Server-side, if people really trust their data to this, right? Well, the biggest websites in the world — Facebook, Google, Instagram, WhatsApp — are all based on 1.3 every day and I think at this point I'm more than happy to endorse you running it for your services as well.
Okay. So, TLS is going great and if you look at it traditionally, what it's covered, it's doing it very well. But its success has really allowed an up-leveled consideration of HTTPS as a security framework, and its various interactions with other parts of the internet and what that means. And is it really fulfilling its goal as a two-party protocol, right? TLS does fine for the scope it has, but are there other ways in which HTTPS really doesn't meet its goal? And the current locus of interest on that surrounds the origin or the hostname of who you're talking to in this conversation, is really considered, by many details, that the server and the client want to keep to themselves, perhaps just for competitive purposes if you're the server, right? You don't want everyone on the internet to know how much your service is used.
That's your information, right? From the client's point of view, it's a privacy infringement, right? So, how do we ratchet up the overall security level of HTTPS to bring that into scope of what's going to talk about? So, we're going to look forward and talk about protecting origin names with HTTPS and this is a mix of completed work, work in progress and a little bit of speculation. First, we'll summarize five traditional problems with HTTPS in this space. The ones in bold I'm going to take a deep dive into and talk about the other ones we really just don't have time for, but I'll speak to them briefly. So, the first one is the service certificates and the handshake. I had a slide about this. This was Lawyer.com, okay? In 1.3, that side of the handshake is actually encrypted and so, that's information that's no longer visible to third parties.
So, that's pretty good. Number two is a replication checking protocol called OCSP. We'll talk about that in detail. IP addresses are another way. So, back in the day, right? Every website had its own IP address and you can just map one to the other because you can't really mask IP addresses without the addition of an overlay network or a tunnel or something like that. But thanks to virtualization and CDNs and all this kind of thing, that mapping is much less obvious than it used to be and so, that's not as strong of a signal. Server name indication is an aspect of TLS itself and there's work actively underway to talk about how we might address the information leakage in that. And then DNS, which is getting all the press at the moment. So, we'll talk about that.
Okay. OCSP and OCSP stapling. The workaround this is — I told you there'd be some work that's done, some works that's in progress, and some things that are more speculative. This is closest to something that's already done, okay? So, OCSP is a certificate revocation checking protocol. So, a client's received a certificate on its connection, it's TLS connection, say to Lawyer.com, and it's verified the signature on it. It's a valid certificate, but it wants to know if this certificate has been revoked since it was issued and traditionally it did this with a call out to the CA itself. It says, "Hey, I got a certificate here, here's its hash, it's to Lawyer.com. Is it still valid? Have you revoked it?" And the CA will say, "Yes, I have," or, "No, I haven't," and it will digitally sign that and this is all clear-text protocol. It's actually playing HTTP and send it back to the client and the client will verify that signature and assuming it says, "No, I haven't revoked it," it'll go on with the connection. There are two real big problems here. Two people learn that you're going to Example.com or Lawyer.com that have no business knowing that because remember, the people that have business knowing that are Lawyer.com and the user and really nobody else, right? So, the first person who learns that is the CA. The CA is in the business of making certificates and of creating security requirements around those, but it's not in the business of tracking usage and yet, by virtue of the client going directly to the CA, you really told the CA exactly what you're about to do next. You're obviously connecting to Lawyer.com, right? So, it learns that. Second person who learns this is anybody who can see your packets on the internet because it's all in plain text.
So, those are both poor properties in OCSP. Stapling is a clever mechanism meant to address that. The notion here is that because that response from the CA is digitally signed, it does not actually have to be presented by the CA in order to verify it, okay? And it's good for a few days. It's not good for as long as the certificate, but it's good for a little bit of time. So, the server, Lawyer.com itself, can periodically actually go to the CA and say, "Can you please provide me a digital attestation that you have not revoked my certificate?" And then whenever it hands out the certificate, it also hands out sort of this Good Housekeeping seal of approval with a digital signature. At the same time, it says, "Look, it's still valid," and pushes that down to the client. Client verifies both of those signatures, right? It doesn't trust the server, but it verifies those signatures and away it goes, which is pretty neat because now the CA isn't involved at runtime of seeing where the client was going on the internet. So, it solves that part of the equation and that works just fine, even in TLS 1.2 as an extension and actually the Fastly CDN will do this by default for you, which is great. However, in 1.2 remember this handshake is still completely in plain text. So, anyone who can see your packets on the internet still knows where you're going because they can see the OCSP information. And that's where 1.3 actually provides the advantage for this particular information leak because the OCSP response is part of the server hello and that's within the encrypted envelope of the handshake.
All right. SNI. SNI stands for server name indication. This is the information the client sends to the server during the handshake that says, "I need the certificate for Example.com." We have multi-customer-based implementations, right? Virtualized infrastructures that share IP addresses so those servers need to know where you're trying to get to in order to give you the certificate that you want to see as a client, and the way you indicate that is during the client hello. As a client, you just do this extension called server name indication and say, "I'm going to Example.com," and that is in plain text and of course that is part of what I think HTTPS wants to consider confidential going forward, but never has in the past. Now we can't solve this in the way that we've solved all the other things by just saying, "Oh, we're going to encrypt that half of the conversation," because this is the wrong half of the conversation. This is in the client side of the hello and the client speaks first. And what that really means is it has not received any key material from the server yet. And without any key material, it can't do any encryption. So, you're stuck. The insight here, and this is clearly a work in progress in the standardization world, but the insight here is that you can actually use key material for the CDN or the hosting infrastructure that is separate from the key material that is going to be used for the connection, which is based on an individual customer. And you can use this separate key material to encrypt just the SNI information. And you can distribute this via the DNS because the DNS lookup happens before the TCP connection or before the client hello in any event, even if this is QUIC, and so you've obtained this information and you can use it to encrypt the SNI and the CDN will still be able to decrypt it.
So, that's the cleverness that makes that work. That is still a work in progress, but actually pretty excited about it. It only makes sense to do in 1.3 because, of course, the server certificate would be in the clear giving the same information anyhow if it was not 1.3. So, it will be sort of a 1.3 innovation.
All right. Last topic or last hole in the veil and the scope of HTTPS is DNS. So, before you connect to Example.com, you need to know what IP address you're going to connect to and, therefore, you do a DNS request for Example.com. And you tell essentially two people that you're going to Example.com. One of them is whoever might convince you that they want to be your name server on your local network in a completely unauthenticated fashion, and you tell them — whoever they might be. And then the second person who finds out is anybody who can watch that communication because this is on unencrypted, unauthenticated protocol. Traditionally this is a pretty big hole. So, there are a few mechanisms out there in the wild of people looking to attack this in a few different ways. The one that seems to be gaining mind share, Richard talked about this a bit, is DNS over HTTPS for which you actually start to view DNS as a service to your application that you connect to over HTTPS. And I see a couple of head-scratchers and you're like, "So, you're telling me you're going to bootstrap an insecure protocol, or at least the information leakage of that protocol, using another protocol that has the same problem? How does that get you anywhere?"
And the answer is you've shifted what you're leaking. If you view your DNS server as another service, okay? And what you leak to the networks, the fact that you are connecting to a DNS service — which isn't actually all that sensitive information, everybody needs a DNS service — this isn't a great insight into you or your behavior. But all the sites you're looking up, which really are very personalized and are supposed to be part of that two-party relationship, now have the confidentiality and transport to protect them. So, you've made a shift. You do have the basic property on that initial connection to the DNS service, but the extent of what you're leaking has been reduced dramatically to some things that are less sensitive.
Okay. So, HTTP has advertised itself as a two-party communication protocol for 20 years, but over that time the definition of what should be included as part of that definition has really expanded and continues to expand and a lot of that is because of the success of TLS. I think TLS 1.3 really is a distillation of a lot of the great lessons learned from 1.2 and its predecessors into a really great set of best practices we can rely on, and we can really address a wider set of problems than we were concerned with previously. So, that's all good news. When I get to this part of the presentation, people often are like, "So, that's all great, that's nice, but what are you going to work on next after you solve those five things?" As if solving those five things doesn't keep me busy, but I'll answer the question sincerely and say that I think the machine learning and traffic analysis space is where you really ought to look if you're going to be a little bit forward-looking on this. And by that, I don't mean machines that are able to take ciphertext in one side and spit out unencrypted data on the other. There's really very little indication that that's possible, even at a nation-state level. But what you can do is you can match sizes and timings and patterns and you can match them against big databases and you can apply modern ML techniques to that and you can say, "This is very similar to a lot of other things I've seen. And I know what actually happened in those other encrypted conversations and maybe you are doing the same thing." So, that is, I think, probably where the cat and mouse game is going to move. There's some potential defenses around that. It'll be an interesting space to watch. I'll be around and that brings me to the end. I'll be around for questions or conversations people want to have, but really thank you for making me part of your day.