Prevent Wasm Compiler Bugs Early | Fastly

Staff Software Engineer on Lucet and WebAssembly

Principal Software Engineer on WebAssembly, Fastly

May 21, 2021

Security cannot be an afterthought. In today’s cloud applications, security has to be considered and built into every aspect of a product or environment, as do the continuous processes of evaluating that security — ongoing testing, fuzzing, and security analysis of new features and capabilities. This is part of our core philosophy about how security should work — and we recently experienced it firsthand.

At Fastly, we recently discovered a compiler bug in Cranelift, part of the WebAssembly compiler that we use for Compute@Edge, that could have allowed a WebAssembly module to access memory outside of its sandboxed heap. Because of the people, processes, and tools we have in place, the bug was caught and patched on our infrastructure before it was exploited. In this post, we’ll tell the story of how we came upon this bug, how the bug occurs, why it could have been serious, and how we verified that it was not actually exploited on our infrastructure.

We’re writing this post in the spirit of transparency, but also to show how an integrated security philosophy spans not only the tools involved, but the processes as well. We implement strong security boundaries on Compute@Edge — in addition to WebAssembly sandboxing, we also rely on security mechanisms implemented at the operating system level. However, because no software is bug-free, part of our security stance has to do with how we respond when issues occur. Let’s dive in!

Setting the stage: Cranelift and heap sandboxing

In Compute@Edge, we execute customer code, contained in WebAssembly (Wasm) modules, on the server with each inbound request. A key aspect of our design is that every customer request runs in a fresh instance of the module: no memory is shared with any other request handler, nor with any other customer’s code.

This memory isolation, or heap sandboxing, is a key design property of WebAssembly and is what gives it many of its strong security properties. Consequently, if the boundaries between separate heaps break down, there could be a potentially serious security issue at hand. This is why we take compiler correctness so seriously, and why we have multiple layers of process and protections to ensure this does not ultimately endanger customers.

We compile customers’ Wasm modules to run on our servers with Cranelift, which is a compiler that translates Wasm bytecode to x86 machine code ahead-of-time. This way, the code is ready to run as soon as a request arrives, which significantly reduces cold-start times — one of the major advantages of Compute@Edge. Cranelift translates heap accesses to native x86 code as well. Each WebAssembly instance (recall, one per request) has its own region of the virtual memory space and carries a pointer to this region as it executes. When the Wasm bytecode accesses the heap, Cranelift translates this into an access at an offset from the heap base. Because Wasm pointers are 32 bits wide, this offset cannot be greater than 4 GiB (4 binary gigabytes, or 2<sup>32</sup> bytes). By sizing the virtual-memory regions larger than this, we ensure that no Wasm instance can reach another instance’s memory, and do this without run-time bounds checks — one of the ways we have designed Compute@Edge to be performant. Between the regions, we place guard pages: these are unmapped virtual addresses that terminate the Wasm instance if touched.

Enter: the bug

Notice an interesting property of this design: we assume (as most code does!) that the compiler faithfully translates the code. Our Wasm sandboxing (which is part of Lucet) generates an add instruction in Cranelift’s internal representation to add the base pointer and Wasm heap address together. What if this integer addition gave the wrong result?

It turns out that this is exactly what happened. To understand the issue, one has to understand:

A bit about how compilers handle values of different widths (e.g., 32 and 64 bits);
How the compiler picks machine instructions to perform arithmetic; and
How they choose registers to place the values in (register allocation).

Let’s go through each of those in turn.

On x86-64, all integer registers are 64 bits wide, but some values — such as 32-bit Wasm pointers — are narrower. Most compilers, including Cranelift, store the narrow value in the low bits of the register and leave the upper bits undefined. When the code that computes the address for the Wasm heap is generated, it needs to include a zero-extend operator that converts the 32-bit Wasm pointer to a 64-bit addend, which is then added to the base address.

Because these operations are common, and it would be quite expensive to perform them explicitly everywhere they are generated, Cranelift’s instruction selector takes advantage of another quirk of x86-64: sometimes, 32-bit instructions actually generate 64-bit values, with the upper bits cleared. This means that the extend operator is superfluous, and can be removed.

All’s well so far — but, enter the register allocator. This is the piece of the compiler backend that chooses where to store values. Sometimes, if the program has too many active variables at once, it has to spill some data to the processor stack and reload it later when it’s needed. This is normal and invisible to the program and allows the programmer to use more variables than there are registers.

Now, however, we can finally describe the bug: when the register allocator spills a register, it knows the type of the value, and in Cranelift, only guarantees to preserve the actual type’s bits. If a value is actually 32 bits wide, but we treat it as 64 bits because we removed a superfluous 32-to-64-bit extend, and then that value is spilled, it could be incorrect when reloaded. Specifically — and unfortunately, in our case — the register allocator used a sign-extending load to reload a 32-bit value. This means that a 32-bit integer greater than 0x8000_0000, then zero-extended to 64 bits in the original program, could spontaneously become negative via an incorrect sign-extension.

Negative offsets are a bad thing where heap offsets are involved!

So what’s the impact?

What this means is that, under rare conditions, a Wasm module on our system could access memory prior to the start of its sandboxed heap. In our system, in order to provide very fast startup and response times, multiple requests are handled from within a single operating system process. Thus, in theory, the ability to read arbitrary memory could have leaked customer data.

It turns out that our system design mitigated the impact. In our Compute@Edge daemon’s memory layout, we place instance heaps more than 4GiB apart in virtual-memory space, with guard regions (unmapped memory) between them. It was thus never possible for a Wasm instance to access another instance’s heap (linear memory): with its maximum backward offset of 2GiB, it would not have reached the top of the previous instance’s heap.

However, it was possible that a malicious Wasm module could use a very carefully constructed load or store to access some critical data that comes just before the start of its heap, including the stack and globals of the previous instance. (See Lucet documentation for some details about the layout.) Once we realized this was possible, it was clear that the risk could have been quite severe. In Lucet, which backs our production WebAssembly execution, that critical data includes structures and pointers that the runtime relies on, and modification of those structures certainly could be the start of a more complicated exploit.

Fortuitously, a much less interesting (and not security-critical!) bug we found at the same time conspired with this compiler bug so that any exploit attempt would require a very large static offset in a load or store, which is something that we could easily scan for. The details of this are covered in the appendix below.

So, to summarize so far: an exploit was theoretically possible, but required a load or store with a particular offset. If this offset were not exactly right, the exploit attempt would likely crash the entire daemon by hitting the guard region for a different instance, protecting the data. And if it didn’t result in a crash, we would see anomalous reports of wildly out-of-bounds accesses in our logs (which we actively monitor), interesting in their own right. Nevertheless, a problem existed, so we worked to further determine potential exposure.

To do so, we wrote a program that analyzed every Wasm module uploaded to Compute@Edge while the bug was present in our systems to look for load and store instructions with offsets in the vulnerable range. Because we value customer privacy greatly, we never accessed or examined modules manually; this special problem-finding task ran in the same isolated compilation pipeline that is used to build modules for Compute@Edge. This analysis showed that no Wasm modules had offsets that could have led to an exploit. We thus could show that no access to other customers’ data would have been possible with any Wasm module that was on our system.

In parallel to that analysis, of course, we immediately patched the bug in Cranelift and re-deployed our infrastructure. With the backward-looking analysis and the bug remediation taken together, we are confident that customer data was and is safe.

How we detected the problem

The story of how this bug came to light is just as interesting as the bug itself!

It all started with some anomalous log entries one morning. One of our engineers noticed that a Compute@Edge daemon had crashed a few times in one PoP with several accesses to memory addresses that should not be possible to reach. This immediately set off red flags: any memory access that cannot be explained is a potentially serious problem.

We quickly determined that the Wasm module causing the crash came from a security researcher at KTH Royal Institute of Technology, Javier Cabrera Arteaga, who had arranged, after discussions with us, to use Compute@Edge for security research. We reached out to Javier to ask for a copy of the Wasm module, and to understand what inputs it might require in order to reproduce the behavior. Javier quickly let us know exactly what his experiments did, and gave us access to the module’s source code.

Once we retrieved the exact version of the Wasm module which caused the crash from Fastly’s systems, we were able to reproduce the crash and, soon enough, we caught the issue in a debugger. The compiler bug was obvious once we saw the disassembly; and once we knew the issue, we could patch Cranelift and ensure our infrastructure was safe.

We did not stop there, however. We needed to understand the impact of the bug to determine any potential exposure and response. We worked to examine heap layouts and bounds-checking schemes and precisely quantify the impact of the bug under different compiler and runtime configurations. We determined which settings and use-cases would expose the bug in Cranelift and the implications this had for our infrastructure. It was at this point that we understood the exploitable conditions and looked for Wasm modules with specific load and store static offsets, as we described above. As part of this process, due to our commitment to the open-source community and the Bytecode Alliance in particular, we also worked to determine the impact on the open-source Wasmtime and Lucet runtimes; the result of this investigation eventually went into our vulnerability disclosure writeup.

As we were investigating the impact conditions, we actually developed a working exploit for our Compute@Edge daemon. This, of course, was sobering to witness; but it was also confidence-building, because we understood exactly what it took to actively exploit the issue. We discovered that the Wasm heap load and store locations had to be exactly right, or else the entire daemon would crash; and it was only with inside knowledge that we were able to complete the exploit. A lack of such crashes observed elsewhere in our real-time logs, combined with our analysis of all existing Wasm modules, allowed us to conclude that the exploit was not used in our environment.

Defense in depth: secure processes, active monitoring, and proactive remediation

Some of our most significant learnings from this incident have been in process. We found that our processes worked to catch the bug, once we observed an anomaly in our logs; we found that we were able to convene all of the relevant product engineers, security engineers, communications and legal staff and efficiently coordinate to promptly remediate the vulnerability.

We also exercised several new muscles. This was the first security vulnerability of its kind in Cranelift since we announced Compute@Edge. This has helped us to make several internal process changes so that we are even more prepared in the future. Also, for the first time, we coordinated with the Bytecode Alliance to find affected users of the software, and to release a security advisory. We strongly believe that it makes sense to work with all members of the Bytecode Alliance in ensuring software security. And we continue to use a diverse set of techniques to find and fix bugs proactively, before they can impact our customers.

While security bugs are never ideal, they are a fact of modern software, and what counts is how one responds to them. This becomes ever more important as more customers rely on Compute@Edge as a secure, versatile platform. We look forward to continuing our security-focused engineering work in all the ways we have described to ensure this is the case!

Appendix: how the bug works, in detail

Closing off this post, we wanted to talk in more detail about how this bug works, and what it looks like as systems and compiler engineers when we see miscompilations in the wild.

A close-to-minimal reproduction of the bug produces disassembly like

; function prologue, storing a few register-based arguments
push   rbp  
mov    rbp,rsp
sub    rsp,0xe0
mov    QWORD PTR [rsp],r12
mov    QWORD PTR [rsp+0x8],r13
mov    QWORD PTR [rsp+0x10],r14
mov    QWORD PTR [rsp+0x18],rbx
mov    QWORD PTR [rsp+0x20],r15
mov    r12,rdi                       ; bug-relevant details begin here!
                                     ; rdi is the first argument, the WebAssembly "VMContext".
                                     ; Lucet sets VMContext to the heap base, with critical structures
                                     ; placed in the (4k) page before the heap.
mov    r11,rsi                       ; rsi is the second argument, the first one from user-controlled
                                     ; WebAssembly code. call it "heap_offset".
mov    rsi,rcx                       ; rcx is the third argument, a user-controlled i64 - call it "user_qword".
mov    QWORD PTR [rsp+0x40],rsi      ; spill "user_qword", just a quirk of this PoC .

...

mov    QWORD PTR [rsp+0x30],r11      ; spill "heap_offset", again just a quirk.
movsxd rsi,DWORD PTR [rsp+0x30]      ; reload "heap_offset".
add    esi,edx                       ; this add helps convince Cranelift to spill in a way it later incorrectly sign extends.
                                     ; edx is also an argument, which is set to 0 in our PoC - this add does not change "heap_offset".
mov    QWORD PTR [rsp+0x30],rsi      ; the spill! we'll revisit this in a moment.

...

movsxd r11,DWORD PTR [rsp+0x30]      ; the incorrect sign-extended load of "heap_offset"!
mov    rdi,QWORD PTR [rsp+0x40]      ; reload "user_qword"
mov    QWORD PTR [r12+r11*1+0x0],rdi ; store "user_qword" to "VMContext" + "heap_offset".
                                     ; since "heap_offset" was sign-extended r11 might be a number like -4096,
                                     ; this store might write "user_qword" over critical structures Lucet relies on.

And here the security implications are pretty clear: if there are critical structures right before the heap base, a small negative offset makes for very easy access to those structures and the difficulty becomes convincing the compiler to emit this pattern of buggy code. For brevity, we aren’t including the dozens of locals that are stored, added, multiplied, and mixed together to make enough register pressure for the compiler to spill the WebAssembly heap offset.

We also mentioned earlier that a second bug complicates would-be exploit attempts, but gave a clear indicator if someone had tried to. The configuration parser we rely on to parse heap configurations interpreted a “4GB” parameter as “4,000,000,000” bytes — decimal “gigabytes”, rather than binary “gibibytes.” Since the maximum heap size was configured below 4GiB, “4,294,967,296”, compiled WebAssembly modules still had a bounds check for that last 294,967,296 bytes of heap space. This made for some unexpected instructions while we investigated the disassembly:

mov    edi, 0xee6b27fe           ; an entirely unexpected constant: 3,999,999,998
movsxd rax, DWORD PTR [rsp+0x88] ; the incorrect sign-extended load
cmp    eax, edi                  ; compare against the heap bound
jae    ff0 <guest_func_4+0x360>  ; and branch to a trap site if out of bounds

This is fortunate, because an attacker’s first idea might be to use a heap offset like 0xfffff000 to go backwards just a little bit and alter the critical structures Lucet relies on. In that case the bounds check would fail, and the program would trap with a heap out of bounds access reported at a concerningly-large offset. Since the maximum (closest to zero) backwards heap pointer is 0xee6b27fd, this would suggest that the 294,967,297 bytes immediately before a WebAssembly instance reaching this bug would be safe from tampering. Unfortunately, we quickly found this isn’t the whole story.

The WebAssembly load and store instructions include an offset immediate intended to simplify loads and stores in working with structures. A structure’s layout in memory is typically the same for a whole program. As an example, struct size’s st_size field is always at the same offset, regardless of where the `struct size` itself is. A compiler then can write the offset as an immediate, and repeated operations on one struct can simply reuse the structure pointer. But the offset is defined in WebAssembly, so an attacker can wholly avoid the bounds check by picking some safely-low heap offset, add a large offset in a load or store, and “reach up” into the region just before an instance’s heap.

At that point we could construct a proof of concept to tamper with an instance’s memory in ways we know violate the security properties of Lucet’s sandboxing — we could read from the instance before a malicious one, or overwrite pointers and take over control flow. In all, this was a great reminder that we must take security issues seriously, even if we can’t imagine how an attacker would leverage them in the moment.