Guide for C and Rust programmers

Software Engineer

February 07, 2019

We recently launched Fastly Labs — a hub of in-progress projects and big ideas at the edge, all of which you can experiment with along with a community of engineers. Our last blog post shed some light on the architecture of one of those projects — Fastly Terrarium, a high-performance WebAssembly compiler and runtime demonstration. We explored how the underlying technology can provide a safe way to execute code at the edge. Safely running untrusted code written in an unsafe programming language, and with negligible overhead is both a common problem and a serious challenge.

WebAssembly elegantly solves this problem by completely avoiding pointers, something C and Rust programmers may be familiar with. Functions and data are represented differently than in more conventional environments, but this doesn't prevent these different environments from calling each other and sharing data.

Below we will see why this is necessary, and how we implemented it in Terrarium. To get the most from this post, it will be helpful if you’re familiar with memory pointers (if not, this blog post is a helpful starter).

It all starts with a “Hello World”

We’ll start off by looking at how code pointers are generally handled in some traditional language implementations. A lot of great projects started by printing "Hello World", and it wouldn't be surprising if the first WebAssembly program ever written also resembled this:

int main(void)
{
	puts("Hello world!");
	return 0;
}

The puts function, responsible for printing the magic words, is not defined anywhere in this source code. Its code is present somewhere else, presumably in a shared library, maintained independently from the main project.

At a high level, the compiled code for the function above is likely to look like this:

put the location of the "hello world" string into a register
call a placeholder address

The compiled code also includes the names of the functions it depends on, as well as where they are referenced. With the exception of some embedded operating systems, the exact address of the puts function is non-deterministic. It's the dynamic linker's role to keep track of what libraries are being loaded and where their symbols are.

When an application depending on dynamic libraries is loaded, the linker will replace placeholder addresses with actual function addresses or with addresses of proxy functions (lazy loading). The code of shared libraries is read-only, and cannot be patched. So, if the main() function above is itself in a shared library, an additional indirection will be required.

But in all cases, a function compiled from languages such as C, C++, or unsafe Rust is directly represented as a memory address. This also holds true for indirect calls. Functions pointers are exactly that: addresses to which the CPU instruction pointer will be set to.

These languages allow applications to jump to any address. However, pointer arithmetic is tricky, bugs happen, and nothing forces these addresses to be valid, or to be the first opcode of a function. The application being run will simply do what it has been instructed to.

Besides the reliability issues this implies, being able to jump to random locations happens to be frequently abused to write vulnerability exploits that will bypass basic mitigations.

How about Webassembly?

In WebAssembly, a function is never represented by its address, but by an index in a function table. This doesn't prevent a WebAssembly compiler from eventually emitting code that will directly use addresses, but calling arbitrary memory locations simply cannot be expressed in WebAssembly.

This means, barring bugs in the implementation, an attacker can’t trick a WebAssembly program into jumping to an arbitrary memory address — they can’t even create one from scratch. Therefore, only a predefined set of locations can be jumped to, immediately reducing the attack surface.

The "Hello World" example above, although originally written in C, will be compiled by Terrarium down to LLVM byte code and then to WebAssembly, before being eventually converted to native code. The conversion to native code is made by the Cranelift code generator, after yet another transformation to an internal representation.

As the Terrarium compiler is converting LLVM byte code to native code, it builds a table that logically links function indices to native function locations. No matter what the application tries to do, jumping to arbitrary addresses is impossible. WebAssembly simply doesn't include any opcodes to do so. Direct and indirect calls only accept indices.

A simple inequality is all it takes to check that an index is within the bounds of the function table, and safely abort if this is not the case.

Memory models

In languages such as C or C++, dynamic memory allocations are typically made via a memory allocator that manages memory pages already reserved by the process, and asks for more pages to the kernel when needed.

When an application requires heap-allocated memory, the newly reserved region will be returned as the address of the first byte of that region. Dynamically allocated objects are then accessed directly using their address in memory.

In WebAssembly, opcodes accessing memory are not given addresses, but offsets relative to the beginning of a linear memory segment whose size is always known.

Like function calls, a simple inequality check is all it takes to verify that access is within the segment (and again, we can safely abort execution if we detect that this is not the case).

Reading and writing to arbitrary memory locations cannot be expressed in WebAssembly. An application cannot access data from the host or from other WebAsssembly guests.

External functions (“hostcalls”)

Going back to the WebAssembly function table for a moment, where would the puts() function fit here? Obviously, the function table includes functions defined by the application itself, minus the ones that could be optimized out. But WebAssembly opcodes can only be used to perform computations. They can't interact with the host itself. In order to do so, they have to call external functions.

These external functions are present in a function table as well, and are called via index, exactly like internal functions. The guarantee that only functions present in a predefined set can be called covers external functions as well. The compiler is responsible for building the function tables, so only calls to external functions that have been explicitly whitelisted are possible.

External functions, or "hostcalls," share the same calling conventions as functions written in WebAssembly, and are thus called the same way, using the same opcodes.

The only difference between external and internal functions is the fact that external functions haven't gone through a WebAssembly transform. Therefore, they are not constrained to what can be expressed in WebAssembly only. In particular, they can call arbitrary functions, and access arbitrary memory locations. They can thus act as a bridge between WebAssembly code, and its environment.

With this, applications written for Terrarium can send HTTP queries, return responses, send DNS queries, get the current time, store persistent data, and more.

Sharing memory between the WebAssembly guest and its environment

The hostcall_req_get_path() function is one of these external functions accessible by applications written for Terrarium. It is also a good example of a function that requires sharing memory between WebAssembly code and its host. As the name implies, the function returns the path from a client HTTP request.

But what memory should that path be put into? Passing data from the guest to the host is straightforward, since the host can access any memory address. But WebAssembly code can only access its dedicated memory segment — so, for the other way round, everything has to be stored in that linear segment.

The WebAssembly code is responsible for managing its own individual heap allocations. In order to do so, it includes its own allocator, that may ask the host to extend the segment size as needed.

In our first proof of concept, hostcalls needing to allocate memory in order to return values to the WebAssembly guest simply extended the linear memory segment with new pages, and stored the values in these newly allocated pages. Unfortunately, this approach had some shortcomings:

Two independent allocators were now competing for the same address space. This is very unusual, and the guest allocator may not be designed to work reliably in such a situation.
A custom allocator had to be designed for the host.
The host cannot make any assumption about how the guest manages its memory. Therefore, it doesn't know when the host doesn't need the returned values anymore, and can't reuse these memory regions for anything. Hostcalls requiring allocations leak memory, and there is no way to recover it.

After our initial prototype, we decided to revamp the memory model of the Terrarium hostcalls. If the host needs to allocate memory from the guest space, it should call a guest function. Values returned to the guest by hostcalls should exclusively allocate memory that way.

The host doesn't need to implement its own allocator any more. We now have a single allocator for the zone, which improves compatibility and reliability.
Values returned that way can be handled like native values. They can be naturally marked as not used anymore via the language's standard mechanisms.
If the size is predictable, the guest can now decide to reuse an existing buffer, or return a stack address. Something that was impossible before.

The idea is to let the WebAssembly call a host function to register two functions: the former to allocate memory, and the latter to mark a previously allocated memory region as unused.

To do so, the hostcall_init_mm() function was implemented. For a guest originally written in C, the most common invocation is:

hostcall_init_mm(malloc, free);

That is, memory allocations will simply use the standard malloc() function call, similar to the rest of the guest code.

Once a hostcall such as hostcall_req_get_path() returns its value, that value can then be freed by the guest using the standard free() function.

Rust guests can register a function that allocate a native std::vec::Vec vector, and let Rust's RAII take care of the deallocation. This approach works no matter how the guest language internally manages the lifetime of its own objects.

Bringing two worlds together

Remember that in native code, functions are memory addresses. But in WebAssembly, functions are numeric identifiers. The arguments of the hostcall_init_mm() function, as seen by the host, are identifiers. But the host cannot directly call anything given these identifiers. Actual function addresses are required.

After the Terrarium toolchain has compiled WebAssembly code to native code, we end up with a native shared library. Like any shared library, it contains functions code, data, as well as tables mapping symbols to internal locations.

When the sandbox runtime needs to call WebAssembly code compiled that way, it loads the relevant shared library, uses the dlsym() function to find the run symbol, and calls the returned address. Any exported function can be called that way, so it used to be the only interface implemented by our runtime.

However, for a host to call a guest function, such as the ones registered by hostcall_init_mm() we also need the ability to call a function given only its identifier, i.e., its index in the WebAssembly function table.

In order to implement this, we had to make a couple of changes. First, we made the function table and its size available as public symbols in the shared libraries produced by our compilation toolchain. This allowed our sandbox runtime to be able to load that table, and return a function address given an index.

From here, implementing host-to-guest calls was straightforward. We just needed to retrieve the address of the registered functions identifiers, and perform standard indirect calls to these addresses.

Done!

The Terrarium hostcalls can now call functions provided by the guest in order to allocate memory for returned values.

And guests can now choose how they want values returned by the host to be allocated. Note that the hostcall_init_mm() can be called multiple times, in order to set a specific allocation strategy for specific hostcalls.

What we just described here is available in Terrarium today. Check out the provided examples or build your own code, and don’t fear memory allocations!