The Maturing of QUIC
It’s no secret Fastly loves QUIC. Not only because we believe it is a necessary step toward a better, more trusted internet. But also because some of us here have been actively involved in the process of taking QUIC from an experiment to an internet standard for more than six years.
QUIC continues to evolve through a collaborative and iterative process at the IETF — of adding features, implementing them, evaluating them, reworking or discarding them because they don’t stand up to continued scrutiny, and refining them. And in doing so, QUIC has matured in more ways than we imagined, yielding a protocol that is remarkably different and substantially better than it was in the beginning. So, keeping your arms and legs inside the ride at all times, let us take you on this journey of how QUIC has gone from an early experiment to a standard poised to modernize the internet.
Google’s early experiment
Google’s gQUIC experiment began as an effort to extend the performance improvements of HTTP/2 further down the stack into the security and transport layers. This was a bold and laudable experiment — replacing the protocols under HTTP was an audacious goal. The first bits of gQUIC flowed between Google servers and Chrome over the internet in 2013. Google carried live traffic from various Google services over gQUIC to Chrome clients, slowly increasing the use of gQUIC as Google services benefited from it.
Data from this early experiment proved critically important to the future evolution of QUIC’s design. This early effort is well documented, but we’ll note two important technical bits that were part of gQUIC in the early days and were subsequently removed through iteration:
Forward Error Correction (FEC): The feature used exactly one scheme (parity) and was not flexible enough to allow other schemes to be implemented. The parity scheme was deemed ineffective and, more importantly, the amount of code that needed to be maintained for this feature far outweighed its benefits.
ACK entropy: This feature of the protocol protected the sender against ACK spoofing attacks. Since all traffic was encrypted, any ACKs would have to be sent by one of the endpoints, and this feature protected the sender against a misbehaving receiver. ACK entropy was removed because the attack it protected against was not particularly impactful, and it introduced a non-trivial amount of code and protocol complexity.
Both of these features are examples of premature optimization of the protocol. As the protocol evolved, these became maintenance pain points, and when the maintenance cost exceeded the potential benefits, the features were removed.
Birth of an IETF protocol
In 2015, Google kicked off an early and informal public discussion at the IETF, called a Bar-BoF (an informal “birds-of-a-feather” gathering), to discuss the gQUIC experiment. There was significant interest from the community, and the Bar-BoF catalyzed two important lines of work over the following year.
First, some independent implementations of gQUIC started to take shape, and chatter within the community made it clear that many organizations were chomping at the bit to start implementing as soon as the IETF started work on QUIC. This was important. Standardizing a protocol only makes sense if there are multiple implementations of it. It's a tremendous amount of work to build a standard of this magnitude, so clear interest from a groundswell of implementers is generally a requirement for getting started. These independent implementations proved that there was a significant appetite to explore standardizing QUIC.
Second, gQUIC was a monolith. The transport and crypto handshakes were fused, and the protocol was built for use with HTTP, not as a general-purpose transport. The following year saw this monolith separated out into parts: the core transport, the crypto handshake, and a mapping of HTTP on top of the transport. This work was done at Google and in the community, but the outcome was eventually visible in the structure of the internet drafts that seeded the work of the formal working group. These conversations surfaced people in the community who would help drive the QUIC effort.
In the year that followed the Bar-BoF meeting, gQUIC carried more traffic, resulting in more data that Google made public. Armed with this data, early protocol drafts, and an initial charter, the community could now move forward. A few of us (from both Google and the community at large) organized a formal BoF meeting at the IETF in 2016 to create a working group that would build a new transport for HTTP. This was one of the most-attended meetings at the IETF. With almost 400 participants in the room, there was a clear consensus to create a working group with the proposed charter.
This was a pivotal moment. QUIC was now effectively owned by the IETF, so the development of the protocol would proceed via consensus-based IETF processes.
QUIC at the IETF
Though new IETF transports have historically struggled for wide adoption, results from gQUIC had captured the attention of major industry vendors — all of whom were participating in the IETF working group. Wide adoption was a distinct possibility. Participants in the QUIC working group knew they had to build a mature protocol that met the needs of everyone at the table. And they had to do it without losing any of the key performance benefits gQUIC had demonstrated.
It is worth noting that the development of QUIC was, and remains, collaborative. The protocol has been developed in an IETF working group by contributors from several organizations, and we all had to build this massive protocol together so that it would be available to all internet users. In a world that is rife with silos and walls, it is a testament to the ethos of the IETF that it enables this rare environment where we collaborate freely with our competitors to make a better internet for everyone.
The group’s work aligned along two unwritten guiding principles:
QUIC would be a two-party protocol, with strong confidentiality and privacy properties. Additionally, it would have anti-ossification characteristics to preserve the malleability of the protocol for the future. This meant strong encryption of as much of the protocol as possible. This also meant that the network that carried these packets would not be privy to most information in QUIC packet headers.
Protocol features would be checked on a scale. If the complexity of a protocol feature or its implementation outweighed its benefits, it would be weeded out. Maintenance is a massive hidden cost; implementers have little patience for maintaining code that does not yield concomitant benefits. Simplicity was going to be difficult, but it was a core value.
How protocol features evolved
QUIC’s development took concerted effort and, sometimes, several redesigns of features. To give the reader a flavor of the changes and the process, we now walk through some salient threads in the protocol’s evolution.
Right from the beginning, one of the big tasks of the working group was to reconcile the cryptographic handshake with other open standards. gQUIC used its own handshake called QUIC-Crypto. QUIC-Crypto was a remarkable piece of technology that pre-dated TLS 1.3 and introduced the idea of a zero-RTT cryptographic handshake. TLS 1.3 was, in part, inspired by QUIC-Crypto, and had all of the latency benefits that QUIC-Crypto offered. After much discussion, mostly among folks involved in gQUIC and TLS 1.3, it was agreed that QUIC would use the TLS 1.3 standard instead of the proprietary QUIC-Crypto. The group knew that using TLS 1.3, without losing the performance benefits of tight integration that gQUIC enjoyed with QUIC-Crypto, would take some work.
Several design iterations ensued, including an alternative proposal to effectively replace the QUIC handshake with a DTLS handshake. The working group finally arrived at the current design, which can be understood as follows. TLS is a two-layer protocol consisting of a handshake protocol and a record layer. Traditionally, this layering has been considered to be internal to the TLS protocol. The working group’s integration of QUIC with TLS did something unique: the final design used the TLS handshake protocol but QUIC’s own record layer, meaning that the transport of the handshake messages would be governed by QUIC. This combined the necessary parts of TLS with the optimized parts of QUIC.The working group agreed that QUIC version 1 would require TLS, since TLS is the security protocol for the web, which was to be the first application for QUIC. However, since other applications might not use TLS for security, it remains possible to use a different cryptographic handshake with QUIC. The transport draft outlines the features needed of such a protocol, and recent work by researchers have provided existence proof of this possibility.
Packet number encryption
Much of the QUIC header was encrypted. But some bits, including the packet number and key phase bits, remained in plain text. We wanted to encrypt those, too, but we seemed to be at a technical dead end. I’ll explain why briefly, but bear with me; this is where things get a bit complicated.
The packet number in QUIC was used for both reliability and as a nonce (non-repeating value) for encrypting the packet. Packet numbers were monotonically increasing to enable better loss detection and compression. Trivially decodable packet numbers were a problem because, similar to the connection ID, packet numbers could be used to correlate a connection moving across networks. The initial solution involved a number of strategies to enable a client to do random packet number jumps when a client moved across networks. But these strategies were riddled with complexity, and new weaknesses were repeatedly discovered. Additionally, exposing packet numbers might lead to their ossification by network middleboxes, limiting their evolution.
Encrypting the packet number was a clean solution, but doing so required another nonce. This nonce would have to be communicated in the header, increasing header overhead of the packet. A clever insight at the QUIC interim meeting in June 2018 in Kista, Sweden, made the implausible possible. The working group realized that encrypted text is cryptographically random and could, therefore, be used as a nonce. This meant that the packet already carried a nonce that could be used for packet number encryption! This insight, along with the recognition that packet numbers did not need the same strength of protection as the rest of the packet did, allowed the creation of a two-step encryption process: encrypt the packet first using the packet number as a nonce, and then encrypt the packet number using some of the encrypted packet as a nonce (and a different key). QUIC now encrypts most of the bits in the header using this strategy.
We’ll give a shout out to one of our own here: Fastly’s Kazuho Oku provided the key breakthroughs for the handshake and the packet number encryption discussions that led to their current designs.
The early header format slowly evolved, eventually becoming an unwieldy header with a large number of fields and constraints. The group achieved a breakthrough when it was realized that the header fields could be split into two groups. The packets used for connection establishment needed to communicate several bits of information, but once the connection was established, only some key headers were necessary. As a result, long and short header formats were created. The long header was structured to be expressive and extensible so that connection establishment could happen with ease and could be extended in the future. The short header was designed to be efficient, since most packets in a connection are expected to carry this header. After connection establishment, QUIC uses short packet headers that can be as small as four bytes.
A long-standing problem in transport protocols is that connections are identified by the four tuple of client and server IP address and port number. TCP connections, for example, do not survive changes in the client’s IP address or port number. This means that connections traditionally are not resilient to client mobility or to any middleboxes that might change either endpoint’s port number mid-connection. Despite efforts, previous solutions have all eluded wide deployment.
QUIC could solve this problem once and for all. Early QUIC design employed an 8-byte Connection ID chosen by the client to identify a connection. It was to be used in lieu of the standard IP address and port number tuple. This design had two fundamental shortcomings, however:
A server had no control over the Connection ID, and the server’s infrastructure could therefore not embed information in the ID that would be necessary for routing a connection’s packets to the correct server.
The Connection ID was retained through IP address changes at the client so that the connection could continue uninterrupted as a client migrated over to a new network attachment point — a client moving from a WiFi network to a cellular network, for instance. Doing so, however, leaked private information, since a third-party observer could correlate a client’s movements based on the otherwise unrelated networks where the same Connection ID showed up.
After several iterations, this design was eventually replaced by the use of two variable-length Connection IDs, one in each direction chosen by the corresponding endpoint. The group also built mechanisms for both endpoints to change their Connection IDs mid-connection. This allowed a migrating client to move across networks without breaking the connection, and enabled it to change the Connection IDs while doing so to avoid any privacy leakage. This new design posed several challenges, especially in ensuring routing stability around Connection ID communication and changes, and these were eventually resolved.
Connection migration is an exciting new feature in QUIC, and we look forward to seeing it be used by applications in practice.
gQUIC provided transport for HTTP semantics in a simple way that had head-of-line blocking issues and did not handle HTTP trailers. The QUIC working group created a clean separation between HTTP and QUIC by formulating a new surface atop QUIC that other applications could build upon. A new mapping was created for HTTP atop this surface: new semantics were created for QUIC streams and stream IDs, uni-directional streams were created to support requirements such as HTTP Push, and HTTP trailers were accommodated.
After much deliberation, it was decided by consensus in both the QUIC and the HTTPbis working groups to call this mapping HTTP/3.
Additionally, a new header compression scheme for HTTP headers over QUIC, known as QPACK, was designed as a replacement for HTTP/2’s HPACK. It uses the parallelism of QUIC streams to avoid head-of-line blocking.
Network operators raised the issue of network manageability. TCP revealed connection information that was no longer visible to the network in QUIC. The key question for the working group was how much information should endpoints reveal to network intermediaries? After much discussion and many rounds of design, the final resolution was the Spin Bit. This is a single bit in the header that is not under encryption cover (but still authenticated by the endpoints), which allows network intermediaries to measure changes in flow round-trip times. This bit would be set by endpoints and, importantly, it was agreed that if an endpoint had privacy concerns, it could unilaterally disable the “spinning” of the bit.This was a significant departure from any other end-to-end protocol designed by the IETF. Network devices typically glean information about a flow from end-to-end protocol headers. QUIC now explicitly shares this information with the network via the Spin Bit.
It is worth reflecting on whether the final design of QUIC turned out to be only as complicated as necessary. The working group agreed that encrypting as much as possible would allow the protocol designers and implementers to retain control over the protocol’s future evolution in the face of ossifying middleboxes. One might argue that packet number encryption is too much mechanism for the potential risk of protocol ossification or information leakage, and that its cost to network management devices is non-trivial. Others might argue that the Spin Bit is an unnecessary mechanism since the incentive for endpoints to turn it on is unclear.
We offer that QUIC’s complexity reflects what is necessary to manage ecosystem interactions in today’s internet. The modern internet has various stakeholders that all have long-term interests in evolving the internet in different, and sometimes opposing, directions. For instance, privacy and protocol evolvability are often at odds with the needs of network management. QUIC’s design reflects this tension.
In other words, QUIC is as simple as the modern internet demands, which is not very simple in absolute terms.
It is quite possible that the working group did not get all the tradeoffs right. Time and experience with this new protocol will help. And as long as the protocol continues to evolve, experience will continue to shape its future versions.
It is impossible to distill all of the wide and deep discussions that led to the various design decisions that were made. But fortunately, all of these discussions are publicly archived on github issues, the working group’s mailing list, and in the minutes of the working group’s meetings.
Through implementation, deployment, and a mountain of debate and discussion, QUIC has slowly made its way from a proprietary idea to experiment to an almost-baked protocol developed by the IETF. A wide deployment of QUIC will no doubt lead us to uncover issues we haven’t yet considered or thought important, and new use cases will emerge. The protocol will evolve and continue to mature to match new challenges.
The IETF working group is now on the cusp of getting the first version of QUIC wrapped up and ready for internet-wide deployment. We at Fastly are busy preparing to be among the leading deployments of QUIC, so that our customers can benefit from QUIC as soon as possible. We have been working furiously on our own QUIC implementation, quicly. We are in the process of getting it into production, and are on track for an early beta launch any day now! Keep watching this space for Fastly’s QUIC offerings and also for our future adventures with QUIC.