Hijacking the control flow of a WebAssembly program

Senior Principal Engineer, Fastly

June 29, 2018

While WebAssembly has already proven a fertile attack surface for the browser, as more web application code moves to WebAssembly from Javascript there will be a need to research and secure WebAssembly programs themselves. The WebAssembly design obviates common classes of attacks that might be inherited from development languages like C and C++, but there is still some room for exploitation.

In this tutorial, we'll cover control flow protection guarantees provided by WebAssembly, known weaknesses, and how to use clang control flow integrity (CFI) in WebAssembly programs to mitigate some risks around control flow hijacks. Along the way we’ll hijack the control flow of a sample WebAssembly program by exploiting a (contrived) type confusion vulnerability. We’ll be adapting some code from the ”Let’s talk about CFI” Trail of Bits blog post series — if you are unfamiliar with control flow integrity the Trail of Bits blog series is a good place to get started.

This is part one of a two-part blog post on some security-related aspects of WebAssembly. Part two of this series will discuss an incomplete-but-interesting set of security topics around WebAssembly embedders — that is, the software that provides an environment to run WebAssembly guest programs, namely browsers.

Note that this blog post does not include a detailed explanation of how WebAssembly works. If you are unfamiliar with WebAssembly you can check out webassembly.org, the official developer’s guide page, or search the web for one of the many available introductory blog posts, tutorials, and videos to learn more.

WebAssembly: Coming to a web app near you

WebAssembly is an open, industry-wide effort to bring a safe, efficient assembly language to the web. WebAssembly technology is developed collaboratively by major browser vendors like Mozilla, Google, Microsoft, and Apple, as well as non-browser web technology companies like Fastly. WebAssembly modules can be downloaded and executed by the majority of browsers in use today. Large development efforts like AutoCAD, QT, and others are increasingly leveraging WebAssembly to deploy speedy and safe apps built from unified C/C++ codebases across desktop, mobile and browser platforms. The WebAssembly ecosystem is growing rapidly, with new tools, applications, and ideas being announced regularly (for example here is one of several curated lists).

WebAssembly is the logical successor to Google Native Client, a performant software fault isolation technology that allowed developers to deploy native applications to Google Chrome. WebAssembly provides an elegant design that has benefited greatly from the lessons learned from Native Client, with highlights including a sound type system that fits on a single page, control flow integrity, limited/local nondeterminism, memory safety guarantees, and more. For details on the design of WebAssembly, check out the 2017 PLDI paper or webassembly.org.

There is a good chance that we’ll see a lot of logic that is traditionally implemented in Javascript, including security controls, appear in WebAssembly in the coming months and years. We can also expect that web developers will do new, beautiful, and crazy things with WebAssembly as the technology moves further into the spotlight.

The WebAssembly design supports memory safety to avoid inheriting some of the endemic problems that can come C, C++, and other potential source languages. The next section discusses this facet of WebAssembly.

Memory safety in WebAssembly programs

The WebAssembly Memory Safety security documentation explains why many classes of memory safety bugs and associated exploit techniques, like stack smashing, ROP, etc. are obviated in WebAssembly programs. That means, assuming the program that compiles and runs the WebAssembly code is correct of course, these attacks are not possible. This is really cool, and standard fare for the extremely well thought-out WebAssembly architecture.

However, the documentation goes on to clarify:

"Nevertheless, other classes of bugs are not obviated by the semantics of WebAssembly. Although attackers cannot perform direct code injection attacks, it is possible to hijack the control flow of a module using code reuse attacks against indirect calls."

We can explore the practical implications of this design through a type confusion exploitation scenario.

The victim: a sample type confusion vulnerability

We’ll use a simplified C++ virtual call type confusion example program from the ”Let’s talk about CFI” Trail of Bits blog post series as a running example.

The concept in this example is that the attacker somehow tricks the program into calling a method on an instance of the wrong type. This occurs in all sorts of ways in the wild (and is not limited to C++), but one common scenario is when the program reads an instance (or instance selection logic) from an untrusted data source (such as the network) and, without checking the instance is of the expected type, calls some object method (or function) on it. If an attacker can feed the program an instance of an unexpected type, they can sometimes take control of the program (or cause other bad things to happen).

The changes made to the Trail of Bits code will be covered below; you can find the code samples used in this tutorial on Github.

Optional tutorial tools setup

This section explains tool setup in case you want to play along at home. If not, feel free to skip this section.

We’ll use a Docker Ubuntu 16.04 guest to compile vulnerable code to native and WebAssembly targets. We’ll share a directory so we can use the tools installed on our host to edit files and use a host web browser to execute WebAssembly.

docker run -v "$(pwd):/src" -t -i ubuntu:16.04 bash

We’ll use clang to compile to native targets, and emscripten to compile WebAssembly targets. Your mileage may vary, but here are some commands I ran to install all the tools in the Ubuntu guest:

root@2dc5f92b98cf:/src# apt-get update && apt-get install -y cmake build-essential python2.7 nodejs git wget tmux
root@2dc5f92b98cf:/src# apt-get install clang-5.0 && ln -s /usr/bin/clang-5.0 /usr/bin/clang && ln -s /usr/bin/clang++-5.0 /usr/bin/clang++
root@2dc5f92b98cf:/src# wget https://s3.amazonaws.com/mozilla-games/emscripten/releases/emsdk-portable.tar.gz && tar -xf emsdk-portable.tar.gz && cd emsdk-portable
root@2dc5f92b98cf:/src/emsdk-portable# ./emsdk update && ./emsdk install latest && ./emsdk activate latest

To create the content in this tutorial I ran a tmux session (which I hereafter note with [tmux]) to use as an emscripten environment. The [tmux] session will use the emscripten toolchain for WASM targets, and the regular guest shell will use the Ubuntu clang toolchain for native targets. Here is the emscripten environment:

[tmux] root@2dc5f92b98cf:/src/emsdk-portable# source ./emsdk_env.sh
 && which clang
/src/emsdk-portable/clang/e1.37.35_64bit/clang

And the clang/native environment:

root@2dc5f92b98cf:/src/emsdk-portable# which clang
/usr/bin/clang

We can use the WebAssembly binary toolkit (WABT) to convert binary WebAssembly modules to text format (and back):

root@2dc5f92b98cf:/src# git clone --recursive https://github.com/WebAssembly/wabt && cd wabt
root@2dc5f92b98cf:/src/wabt# make && make install

Thwarting an exploit with WebAssembly type checking

WebAssembly embedders (i.e. browsers) generally check the types functions (in terms of their arguments and return values) to make sure they are correct before a WebAssembly program is allowed to execute (see also 'WebAssembly.validate'). However, typing checking for indirect calls — which are are analogous to calling a function pointer in C or C++ — happens at runtime. When a type check for an indirect call fails, the WebAssembly program halts and a trap is raised. In a browser, this eventually results in a Javascript exception being raised that the user code can handle (or not). Regardless, thanks to guarantees provided by the WebAssembly design the embedder (i.e. browser) process can safely continue execution without fear of undefined behavior (ex: memory corruption).

To observe a successful type check in action, we’ll use modified version of cfi_vcall.cpp from the Trail of Bits sample code. Our modified version is called cfi_vcall_diff.cpp. In cfi_vcall_diff.cpp, the victim function (the one the program innocently tries to call) takes an integer argument, but the evil function (the one the attacker is somehow able to supply) is constrained to take a float argument. To recap, the attacker is trying to get the victim to execute this:

    virtual void makeAdmin(float * i) {
        std::cout << "CFI Prevents this control flow " << i << "\n";
        std::cout << "Evil::makeAdmin\n";
    }

Instead of this:

    virtual void printMe(int i) {
        std::cout << "Derived::printMe " << i << "\n";
    }

In the next section we’ll demonstrate exploitation with these programs.

Experiment #1: Exploiting type confusion in a native executable

We can observe what happens without WebAssembly type checking by first compiling the vulnerable/exploited program with clang:

root@2dc5f92b98cf:/src/clang-cfi-showcase# clang++ -Weverything -Werror -Wno-weak-vtables -o cfi_vcall_diff cfi_vcall_diff.cpp

And then running it:

root@2dc5f92b98cf:/src/clang-cfi-showcase# ./cfi_vcall_diff
Derived::printMe 55.5
CFI Prevents this control flow 0
Evil::makeAdmin

You can see from the output above that the vulnerable program was exploited; the attacker makeAdmin payload executed. This is possible because the native machine code doesn’t include any checks on the function parameter types at runtime — the “evil” function runs unabated (though with funky results, since the function considers the supplied integer to be a float).

Experiment #2: Preventing a type confusion exploit in WebAssembly

We can observe what happens with WebAssembly type checking by compiling the vulnerable/exploited program with emscripten:

[tmux] root@2dc5f92b98cf:/src/clang-cfi-showcase# emcc cfi_vcall_diff.cpp -Werror -s WASM=1 -o cfi_vcall_diff.html

The above command will produce a WebAssembly module (.wasm), wrapper Javascript to call into, etc. (.js) and an HTML file to tie it together (.html). We can browse to the resulting HTML page and open the developer console in our browser to observe the result. One way to do this is to run a Python SimpleHTTPServer on the host system:

mayor:clang-cfi-showcase foote$ python -m SimpleHTTPServer 8081

And browse to the generated page:

Ah-ha! The WebAssembly code catches the error and traps back to the host program. When the interpreter executes call_indirect, the type check fails and a trap is raised. We can convert the WebAssembly code to its text representation to better understand what is going on here:

wasm2wat cfi_vcall_diff.wasm > cfi_vcall_diff.wat

Viewing the file, we can see the call_indirect invocation (annotated):

    [...]
    f64.const 0x1.bcp+5 (;=55.5;) // Push arg (55.5) onto the stack
    get_local 4                   // Calculate function ptr (cont’d)
    i32.const 15                  // (an index into the func table)
    i32.and                       // ..
    i32.const 5376                // ..
    i32.add                       // ..
    call_indirect (type 0)        // call func ptr: printMe/makeAdmin
    [..]

The type definition that is checked by call_indirect defines a function that takes a float:

  (type (;0;) (func (param i32 f64)))

Accordingly, the type check fails in this case because because the two functions have different signatures in WebAssembly: the “victim” function takes a float (f64) while the “evil” function takes an integer (i32).

Note that while the WebAssembly program will halt and trap at this point, the fault will be isolated to the WebAssembly guest program instance by design. This means the browser process that hosts the WebAssembly can keep sailing along safely without having to worry about memory corruption or similar.

Exploiting weaknesses in WebAssembly type checking

WebAssembly provides an elegant safety design including a sound type system. One of the side effects of this design is that (as of this writing) WebAssembly provides few value types: i32, i64, f32, and f64. This means that all value types from the source language (C or C++ for example) map down to these types, and that these are the types that WebAssembly uses for indirect call type checking. Correspondingly, this means that in our running example, an attacker might be able to hijack the control flow of the victim WebAssembly if they can manage to supply a function whose WebAssembly type signatures match (as discussed in the WebAssembly memory safety documentation).

To observe this behavior in action, we’ll use modified version of cfi_vcall.cpp from Trail of Bits, called cfi_vcall_same.cpp. In cfi_vcall_same.cpp, the victim function (the one the program innocently tries to call) takes an integer argument, and the evil function (the one the attacker is somehow able to supply) is constrained to take a void argument. Even though these are different types in C++, they will map to the same WebAssembly type. This means that the function signatures between the “victim” and “evil” functions will match, and the attacker can hijack the control flow of the victim WebAssembly program. To recap, in cfi_vcall_same.cpp the attacker is trying to get the victim to execute this:

    virtual void makeAdmin(void * i) {
        std::cout << "CFI Prevents this control flow " << i << "\n";
        std::cout << "Evil::makeAdmin\n";
    }

Instead of this:

    virtual void printMe(int i) {
        std::cout << "Derived::printMe " << i << "\n";
    }

Once again, in the next section we’ll demonstrate exploitation with these programs.

Experiment #3: Exploiting type confusion in a native binary (again!)

We can once again observe what happens without WebAssembly type checking by first compiling the vulnerable/exploited program with clang:

root@2dc5f92b98cf:/src/clang-cfi-showcase# clang++ -Weverything -Werror -Wno-weak-vtables -o cfi_vcall_same cfi_vcall_same.cpp

And then running it:

root@2dc5f92b98cf:/src/clang-cfi-showcase# ./cfi_vcall_same
Derived::printMe 55
CFI Prevents this control flow 66
Evil::makeAdmin

You can see from the output above that the vulnerable program was exploited; the attacker makeAdmin payload executed. This is possible because the native machine code still doesn’t include any checks on the function parameter types at runtime — the “evil” function once again runs unabated.

Experiment #4: Exploiting a type confusion vulnerability in WebAssembly

We can observe what happens with WebAssembly type checking by compiling the vulnerable/exploited program with emscripten:

[tmux] root@2dc5f92b98cf:/src/clang-cfi-showcase# emcc cfi_vcall_same.cpp -Werror -s WASM=1 -o cfi_vcall_same.html

Like the previous WebAssembly experiment, the above command will produce cfi_vcall_same.wasm (the WebAssembly module), cfi_vcall_same.js (a Javascript file that defines the interface between the browser and the WebAssembly module) and cfi_vcall_same.html (an HTML page that runs the Javascript).

We can again run Python SimpleHTTPServer on the host:

mayor:clang-cfi-showcase foote$ python -m SimpleHTTPServer 8081

And browse to the generated page:

We can see here that the type confusion vulnerability exists and the “exploit” executes — makeAdmin runs. This is because this time around both printMe and makeAdmin will have the matching type signatures along the lines of:

  (type (;0;) (func (param i32 i32)))

This is because the C void* and int types both map to the i32 type in WebAssembly (note that WebAssembly programs use 32-bit addressing — for more information on this topic, I recommend the 2017 PLDI paper). So, while WebAssembly checks the function signature of the function it is about to call (that is, the WebAssembly types of the function parameters and result), the two functions have the same signature so the exploit runs successfully.

Using Clang CFI to mitigate WebAssembly exploits

As we observed above, while the simple type system of WebAssembly yields tremendous benefits, one of the downsides is that type confusion vulnerabilities can still occur. Fortunately, just like in native executables, we can compile WebAssembly code with Clang CFI checks. As discussed in the WebAssembly memory safety documentation, this both helps defend against the code reuse attacks we are exploring here and uses the generally-finer-grained C/C++ types for other function signature checks as well.

Ultimately this means is that clang CFI checks can be compiled into the WebAssembly program and enforced in the embedding (i.e. browser). Pretty cool!

Experiment #5: Enforcing clang CFI in a native executable

We can observe nominal clang CFI enforcement in a native binary by compiling our cfi_vcall_same.cpp example with -fsanitize=cfi-vcall (outputting the binary to cfi_vcall_same_cfi):

root@2dc5f92b98cf:/src/clang-cfi-showcase# clang++ -Weverything -Werror -Wno-weak-vtables -fvisibility=hidden -flto -fsanitize=cfi-vcall -fno-sanitize-trap=all -o cfi_vcall_same_cfi cfi_vcall_same.cpp

And then running it:

root@2dc5f92b98cf:/src/clang-cfi-showcase# ./cfi_vcall_same_cfi
Derived::printMe 55
cfi_vcall_same.cpp:45:5: runtime error: control flow integrity check for type 'Derived' failed during virtual call (vtable address 0x0000004300a0)
0x0000004300a0: note: vtable is of type 'Evil'
 00 00 00 00  20 84 42 00 00 00 00 00  30 84 42 00 00 00 00 00  60 84 42 00 00 00 00 00  00 00 00 00

We can see from the output above that clang CFI works as intended — the exploit is blocked.

Experiment #6: Enforcing clang CFI in a WebAssembly program

Now for the really interesting part: we can observe clang CFI enforcement in the previously-exploited WebAssembly program by calling emscripten with analogous -fsanitize=cfi-vcall flags:

[tmux] root@2dc5f92b98cf:/src/clang-cfi-showcase# emcc cfi_vcall_same.cpp -fvisibility=hidden -flto -fsanitize=cfi -s WASM=1 -o cfi_vcall_same_cfi.html

And then view the resulting HTML file in the browser:

We can see above that the clang CFI enforcement occurs in the browser(!), thwarting the exploit.

Wrapping up

In this blog post we showed how to hijack the control flow of a sample WebAssembly program by exploiting a type confusion vulnerability, demonstrating some of the memory safety guarantees provided by WebAssembly along the way. We also covered using clang CFI to strengthen WebAssembly programs against these classes of attacks.

Overall, WebAssembly is a well-designed technology that provides a great baseline for secure development; hopefully this tutorial and discussion illustrated some security aspects of the ecosystem.

Stay tuned for part 2 of this series, which will discuss an incomplete-but-interesting set of security topics around WebAssembly embedders — the software that provides an environment to run WebAssembly guest programs, namely browsers.