Skip to main content

2 posts tagged with "code"

View All Tags

· 9 min read

In our last blog post, we showed you a WebAssembly compiler that fits in a tweet. In just 192 bytes of JavaScript, it takes arithmetic expression written in postfix notation — like 20 2 * 2 + — and produces a Wasm module with a single function which evaluates the expression.

In this post, we’re going to look at WebAssembly from another angle: we’re going to write an interpreter for Wasm from scratch. So, rather than executing the Wasm module with Node.js, we’re going to examine the instructions and evaluate them ourselves.

At the end of this post we will have a bytecode interpreter (or virtual machine) that can evaluate arithmetic and comparison expressions like 2 * 3 + 4 == 10:


const vm = new VM([
i32.const(3),
i32.const(2),
i32.mul,
i32.const(4),
i32.add,
i32.const(10),
i32.eq,
]);

Does this sound fun? If so, read on!

The WebAssembly execution model

If you’re not intimately familiar with WebAssembly and its execution model, a bit of background might be useful before we jump in.

At its heart, WebAssembly (or Wasm for short) is a low-level bytecode format. It’s mainly intended as a compilation target, allowing languages like C++, Go, and Rust to be executed in the browser at near-native speed.

Normally, to execute a Wasm module, a WebAssembly engine (like your browser) would translate the module’s bytecode to native machine code and execute that. But many engines include an interpreter — either in addition to a compiler, or instead of one (see wasm3).

Stack machines

Like many other bytecode formats, WebAssembly is based on a stack machine model. This means that many instructions implicitly operate on a stack of values: popping their arguments, and pushing the result back to the stack.

Here’s a diagram showing the state of an interpreter, before and after it executes an i32.mul instruction:

Note that a fully-featured Wasm interpreter contains other state, but we won’t be dealing with any of that in this post.

Alright, are you ready to start implementing the interpreter?

If you’d like to follow along by running the code yourself, then you should start by creating a file named wasm-vm.js. All of the snippets we show will be added to that file. Let’s start with a few imports:

wasm-vm.js

import assert from 'node:assert';
import test from 'node:test';

The first step is to decide how we’ll represent the instructions. Usually an interpreter would operate on a binary stream of data, and then parse (or decode) the data into a more structured form.

In this post, we’ll skip the decoding and just work directly with the structured representation. Let’s start by defining a class to represent instructions:

wasm-vm.js

class Instruction {
constructor(opcode, name, evalFn) {
this.opcode = opcode;
this.name = name;
this.evalFn = evalFn;
}
eval(vm) {
return this.evalFn(vm);
}
}

And here’s a function to create instances in a compact way:

wasm-vm.js

const instr = (opcode, name, evalFn) => new Instruction(opcode, name, evalFn);

The Instruction fields are:

  • opcode: a numeric value that identifies the instruction in the binary encoding.
  • name: a human-readable instruction identifier.
  • evalFn: a function that takes a vm argument and operates on it according to the instruction’s specification.

Following the spirit of our book, let’s start with the minimum viable instruction — the one that does nothing, nop:

wasm-vm.js

const nop = instr(0x01, 'nop', (_vm) => {});

How do we test something that “does nothing”? We can evaluate it passing null as the vm argument to make sure it doesn’t use it:

wasm-vm.js

test('nop does nothing', () => {
const vm = null;
nop.eval(vm);
});

Other than doing nothing, WebAssembly deals mostly with numbers and operations on them. As we mentioned at earlier, these operations manipulate values on a stack.

The next thing we’ll need is a way to push values to the stack. For that, we’ll need to support instructions that take a value as a parameter, which the spec refers to as an immediate argument (or just immediate, for short).

Let’s define a new subtype of instruction that takes an immediate value:

wasm-vm.js

class InstructionImm extends Instruction {
constructor(opcode, name, immediate, evalFn) {
super(opcode, name, evalFn);
this.immediate = immediate;
}
eval(vm) {
return this.evalFn(vm, this.immediate);
}
}

And a function to create them:

wasm-vm.js

const instrImm = (opcode, name, immediate, evalFn) =>
new InstructionImm(opcode, name, immediate, evalFn);

Notice that for this type of instructions evalFn takes an extra argument, the immediate value.

The last piece we need is a way to represent numbers. An interesting thing about WebAssembly is that it’s a typed assembly language. WebAssembly 1.0 has four primitive value types:

  • i32
  • i64
  • f32
  • f64

We’ll start by defining the i32 type and add support for the rest later:

wasm-vm.js

class I32 {
constructor(value) {
this.value = value;
}
}
const i32 = (value) => new I32(value);

We are now ready to define our second instruction, i32.const, which takes an immediate (the value to be pushed to the stack).

wasm-vm.js

i32.const = (value) =>
instrImm(0x41, 'i32.const', value, (vm, imm) => vm.push(i32(imm)));

Since nop didn’t use the vm argument, we never defined a type for it. But i32.const pushes a value to the stack, so let’s define a Stack:

wasm-vm.js

class Stack {
constructor() {
this.items = [];
}
push(value) {
this.items.push(value);
}
// ...
}

Now we can test that our new instruction does the right thing:

wasm-vm.js

test('i32.const pushes an I32 to the stack', () => {
const vm = new Stack();
const instr = i32.const(42);
instr.eval(vm);
assert.deepStrictEqual(vm.items, [i32(42)]);
});

Notice that we pass an instance of Stack as the vm argument — it’s sufficient for now.

With values in the stack it makes sense to want to pop them too. The drop instruction does that:

wasm-vm.js

const drop = instr(0x1a, 'drop', (vm) => vm.pop());

To complete the implementation, we need to add the pop method to Stack:

wasm-vm.js

class Stack {
// ...
pop() {
return this.items.pop();
}
// ...
}

Let’s test that we can push and pop values:

wasm-vm.js

test('drop pops from the stack', () => {
const vm = new Stack();
run(vm, [i32.const(42), drop]);
assert.deepStrictEqual(vm.items, []);
});

Here’s the definition of the run function to evaluate a list of instructions:

wasm-vm.js

function run(vm, instructions) {
for (const instr of instructions) {
instr.eval(vm);
}
}

Ok, the test should now pass!

Pushing and popping values is a decent start, but what else can we do with them? We can add them together, with the i32.add instruction:

wasm-vm.js

i32.add = instr(0x6a, 'i32.add', (vm) => {
const c2 = vm.popI32();
const c1 = vm.popI32();
const c = i32(c1.value + c2.value);
vm.push(c);
});

To evaluate i32.add:

  • Pop two values from the stack, asserting that they are of type i32
  • Add them
  • Push the result back to the stack.

This instruction uses a new method: popI32. Let’s extend Stack with methods to operate on the top value and check its type:

wasm-vm.js

class Stack {
// ...
peek() {
return this.items.at(-1);
}
topIsOfType(Class) {
return this.peek() instanceof Class;
}
assertTopIsOfType(Class) {
if (!this.topIsOfType(Class)) {
throw new Error(
`Expected ${Class.name} on top of stack, got ${this.peek()}`,
);
}
}
popType(T) {
this.assertTopIsOfType(T);
return this.pop();
}
popI32() {
return this.popType(I32);
}
// ...
}

Now we have everything we need to test i32.add:

wasm-vm.js

test('i32.add pops two I32s and pushes their sum', () => {
const vm = new Stack();
run(vm, [i32.const(42), i32.const(23), i32.add]);
assert.deepStrictEqual(vm.items, [i32(65)]);
});

The run function runs the instructions to completion, but it would be nice to step and inspect the intermediate states too.

Let’s create a proper VM class with a stack, a list of instructions, and a program counter (pc) to keep track of the current instruction. We’ll add methods to manipulate the stack and step through instructions:

wasm-vm.js

class VM {
constructor(instructions) {
this.stack = new Stack();
this.instructions = instructions;
this.pc = 0;
}
push(value) {
this.stack.push(value);
}
pop() {
return this.stack.pop();
}
peek() {
return this.stack.peek();
}
popI32() {
return this.stack.popI32();
}
popType(T) {
return this.stack.popType(T);
}
step() {
const instruction = this.instructions[this.pc];
instruction.eval(this);
this.pc += 1;
}
steps(count) {
for (let i = 0; i < count; i++) {
this.step();
}
}
}

Notice that most of the methods delegate to the stack. An interesting one is the step method: it fetches the current instruction, evaluates it, and increments the program counter.

Let’s add two numbers again, but this time, we’ll check the stack after each step:

wasm-vm.js

test('VM executes two i32.const and an i32.add', () => {
const vm = new VM([i32.const(42), i32.const(23), i32.add]);
vm.step(); // Eval `i32.const 42`
assert.deepStrictEqual(vm.stack.items, [i32(42)]);
vm.step(); // Eval `i32.const 23`
assert.deepStrictEqual(vm.stack.items, [i32(42), i32(23)]);
vm.step(); // Eval `i32.add`
assert.deepStrictEqual(vm.stack.items, [i32(42 + 23)]);
});

This test should pass.

A natural next step is to implement the rest of the arithmetic operations that, like i32.add:

  1. Take two values from the stack
  2. Perform an operation on them
  3. Push the result back.

In the spec, these operations are collectively called t.binop, where t is the type — in our case, i32.

Let’s create a function to define them:

wasm-vm.js

function binop(opcode, name, fn) {
return instr(opcode, name, (vm) => {
// 2. Pop the value t.const c2 from the stack.
const c2 = vm.popI32();
// 3. Pop the value t.const c1 from the stack.
const c1 = vm.popI32();
// 4.1 Let c be a possible result of computing binop<t>(c1, c2).
const c = i32(fn(c1.value, c2.value));
// 4.2 Push the value t.const c to the stack.
vm.push(c);
});
}

Let’s try it by defining i32.sub:

wasm-vm.js

i32.sub = binop(0x6b, 'i32.sub', (c1, c2) => c1 - c2);

Testing binops is as regular as defining them. Let’s create a function for testing binary operations:

wasm-vm.js

function checkBinop(c1, c2, instruction, expected) {
const vm = new VM([i32.const(c1), i32.const(c2), instruction]);
vm.steps(3);
assert.deepStrictEqual(vm.stack.items, [i32(expected)]);
}

And use it to test i32.sub:

wasm-vm.js

test('i32.sub', () => {
checkBinop(42, 23, i32.sub, 42 - 23);
});

Now let’s define i32.mul and i32.div_s, and a small test for each:

wasm-vm.js

i32.mul = binop(0x6c, 'i32.mul', (c1, c2) => c1 * c2);
i32.div_s = binop(0x6d, 'i32.div_s', (c1, c2) => Math.trunc(c1 / c2));
test('i32.mul', () => {
checkBinop(42, 23, i32.mul, 42 * 23);
});
test('i32.div_s', () => {
checkBinop(42, 23, i32.div_s, Math.trunc(42 / 23));
checkBinop(2, -3, i32.div_s, Math.trunc(2 / -3));
});

After arithmetic operations a good next step is to implement comparison operations, known in the spec as relops.

A relop is like a binop in that it takes two values from the stack and pushes one back. The difference is that the value it pushes back is either i32(1) to represent true or i32(0) to represent false.

Let’s create a function to define them:

wasm-vm.js

function relop(opcode, name, fn) {
return binop(opcode, name, (c1, c2) => (fn(c1, c2) ? 1 : 0));
}

We are reusing binop and wrapping the provided fn in a function that returns 1 if the call to fn(c1, c2) is truthy and 0 if not.

Let’s define our first relop, i32.eq which checks if two i32s are equal:

wasm-vm.js

i32.eq = relop(0x46, 'i32.eq', (c1, c2) => c1 === c2);

And test it using checkBinop:

wasm-vm.js

test('i32.eq', () => {
checkBinop(42, 23, i32.eq, 0);
checkBinop(23, 23, i32.eq, 1);
});

We can now implement the remaining relops:

wasm-vm.js

i32.ne = relop(0x47, 'i32.ne', (c1, c2) => c1 !== c2);
i32.lt_s = relop(0x48, 'i32.lt_s', (c1, c2) => c1 < c2);
i32.gt_s = relop(0x4a, 'i32.gt_s', (c1, c2) => c1 > c2);
i32.le_s = relop(0x4c, 'i32.le_s', (c1, c2) => c1 <= c2);
i32.ge_s = relop(0x4e, 'i32.ge_s', (c1, c2) => c1 >= c2);

And that’s it for now! All the tests should be passing. We now have an interpreter that does arithmetic and comparison operations using a stack — let’s try it with a complex expression:

wasm-vm.js

test('complex expression works', () => {
// 2 * 3 + 4 == 10;
const vm = new VM([
i32.const(3),
i32.const(2),
i32.mul,
i32.const(4),
i32.add,
i32.const(10),
i32.eq,
]);
vm.steps(3); // Eval `i32.const 3; i32.const 2; i32.mul`
assert.strictEqual(vm.stack.peek().value, 6);
vm.steps(2); // Eval `i32.const 4; i32.add`
assert.strictEqual(vm.stack.peek().value, 10);
vm.steps(2); // Eval `i32.const 10; i32.eq`
assert.deepStrictEqual(vm.stack.items, [i32(1)]);
});

🏁 wasm-vm-01/wasm-vm.js
Open in StackBlitz

That’s a pretty good start! We’ve got the core of a simple interpreter for WebAssembly 1.0 instructions. In the next post, we’ll extend this interpreter, and add support for instructions that operate on local variables.

Let’s finish things off by exporting all the things that we’ll use in the next post:

wasm-vm.js

export {I32, drop, i32, instr, nop, Stack, VM};

Catch you next time!


Thanks to Job van der Zwan and Karl Weber for providing helpful feedback on this post.

If you enjoyed this post, you should check out WebAssembly from the Ground Up — an online book to learn Wasm by building a simple compiler in JavaScript.

If you’d like to be notified when we publish the next post, you can sign up for our mailing list, where we share periodic updates on the book and interesting WebAssembly tidbits.

· 18 min read

Introduction

One of the initial explorations that started this book was how small and simple a compile-to-WebAssembly language implemented in JavaScript could be. Our first “WebAssembly compiler in a tweet” was 269 bytes; since then, we’ve managed to whittle it down to a measly 192 bytes.

The final result is a compiler that takes an arithmetic expression — written in reverse polish notation — and compiles it down to a valid WebAssembly module. That module exports a single function which returns the result of the original arithmetic expression. Here it is:


let c=(b,l)=>WebAssembly.instantiate(new Int8Array(
[,97,115,109,1,,,,1,5,1,96,,1,127,3,2,1,,7,4,1,,,,10,
l=(b=b.split` `.flatMap(t=>t>-1?[65,t]:107+'-*/'.indexOf(t)))
.length+4,1,l-2,,...b,11]))

And here’s an example of how you can use it:


(await c('11 11 1 - + 4 * 2 /')).instance.exports['']()

But this is not just a clever trick — if you take the time to understand what this code does, you’ll learn a surprising amount about WebAssembly! In the rest of the post, we’ll explain how it all works by de-obfuscating the code one step at a time.

You can play with the code in this post here: stackblitz.com/edit/rpn-to-wasm-js-compiler.

Format

The first thing we can do to make it more readable is to format it:


let c1 = (b, l) =>
WebAssembly.instantiate(
new Int8Array([
, 97, 115, 109, 1, , , , 1, 5, 1, 96, , 1, 127, 3, 2, 1, , 7, 4, 1, , , , 10,
(l = (b = b.split` `.flatMap(
(t) => t > -1 ? [65, t] : 107 + "-*/".indexOf(t)
)).length + 4),
1, l - 2, , ...b, 11
])
);

While it’s still pretty unreadable, now we can at least identify different parts of the code.

At a high level, what we’re doing is ‘parsing’ the expression in a very simple way, turning it into the appropriate Wasm bytecode, and then hand-crafting the bytes for a single-function module.

In a more complex compiler you would probably use a library to generate the WebAssembly module and compile the expressions but our main metric here is code size so we write the bytes directly in an array.

Remove Assignment Expression

The first trick to undo is the assignment expression.

In JavaScript the assignment operator is an expression. This means that it generates a result after evaluating, as you can see in the following examples:


let a, b;
console.log('a', a = 42);
a = b = 43;
console.log('b', b);

The code above will output:


a 42
b 43

This is because a = 42 assigns 42 to a and the whole assignment expression evaluates to the value being assigned.

In a = b = 43, we assign the result of evaluating b = 43 to a. This equivalent expression may be easier to understand: a = (b = 43).

In our code, we use this trick to reuse variables and update their value in places where we can also use the value being assigned. It also allows us to have our compiler in a single expression, avoiding the need for curly braces, semicolons and return statements.

To undo it, we turn the body of our function into a block and do each assignment on its own line:


let c2 = (b, l) => {
b = b.split` `.flatMap(
(t) => (t > -1 ? [65, t] : 107 + "-*/".indexOf(t))
);
l = b.length + 4;
return WebAssembly.instantiate(
new Int8Array([
, 97, 115, 109, 1, , , , 1, 5, 1, 96, , 1, 127, 3, 2, 1, , 7, 4, 1, , , ,
10, l, 1, l - 2, , ...b, 11
]),
);
};

Undo Variable Tricks

Now the assignments are easier to identify but the meaning of variables and function arguments are still hard to understand. Let’s fix that by undoing a couple of variable tricks.

The first step is to stop using single letter variables, and to use more descriptive names instead. The next step is to stop reusing variables: for example, b initially holds the code to compile, but once we don’t need that any more we reuse it to hold the bytecode instructions.

To undo this we are going to introduce a new instrs variable and rename b to code. We’ll also rename l to len. This variable contains a value that is close to the number of bytecodes.

By declaring l in the body we can remove it from the function argument’s list. We did this as a trick to avoid the need to declare it with let or const, saving some bytes and the need for a function body.

The trick works by adding unused arguments at the end of the function argument list and using them as local variables. Our compiler function expects a single argument with the code; l is there for us to use since we don’t expect the caller to provide any value for it.

Here’s the code without this trick:


let c3 = (code) => {
const instrs = code.split` `.flatMap(
(t) => (t > -1 ? [65, t] : 107 + "-*/".indexOf(t))
);
const len = instrs.length + 4;
return WebAssembly.instantiate(
new Int8Array([
, 97, 115, 109, 1, , , , 1, 5, 1, 96, , 1, 127, 3, 2, 1, , 7, 4, 1, , , ,
10, len, 1, len - 2, , ...instrs, 11
]),
);
};

Add Missing Zeros

If you look at the array in our code, you may notice that there are many commas followed by another comma instead of a value. This syntax defines “sparse arrays”. Here’s an example:


const a1 = [,,];
console.log(a1.length); // Output: 2
console.log(a1); // Output: [ <2 empty items> ]

Which is equivalent to:


const a2 = new Array(2);
console.log(a2.length); // Output: 2
console.log(a2); // Output: [ <2 empty items> ]

We use this syntactic trick to save one byte each time we need a 0 to appear in the array. This works because Typed Arrays coerce all array items to numbers, and an “empty item” will be converted to 0:


new Int8Array([0, null, undefined,,0])

which produces:


Int8Array(5) [ 0, 0, 0, 0, 0 ]

Let’s undo this trick by adding all the zeroes back:


let c4 = (code) => {
const instrs = code.split` `.flatMap(
(t) => (t > -1 ? [65, t] : 107 + "-*/".indexOf(t))
);
const len = instrs.length + 4;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, len, 1, len - 2, 0, ...instrs, 11
]),
);
};

Remove Extra 4 bytes on Length Definition

In our code, we have a variable len that contains a number that is close to the number of bytecodes in the compiled expression, but not exactly the same:


const len = instrs.length + 4;

In the WebAssembly module we need to use the number of bytes in the function body (the expression to evaluate) in two places:

  • To define the code section’s length
  • To define the function body’s length

Since there’s only one function in the code section both values are similar:

  • The section takes two extra bytes (section identifier and number of code entries)
  • The function body takes another two bytes (number of locals and end instruction)

To avoid writing b.length twice we assign to l the value of b.length + 4 in the place where we need the code section byte count and then calculate l - 2 (b.length + 2) where we need the function body byte count.


[
...
l=(b=b.split` `.flatMap(t=>t>-1?[65,t]:107+'-*/'.indexOf(t))).length+4,1,l-2
...
]

This is all a trick to avoid having to write b.length twice.

let’s assign the length to len and calculate the right value in each place:


let c5 = (code) => {
const instrs = code.split` `.flatMap(
(t) => (t > -1 ? [65, t] : 107 + "-*/".indexOf(t))
);
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

Remove String Template Literal Instead of Function Call

The next trick to undo is code.split` `. In this case, we use the Tagged Template feature of String Template Literals.

Let’s see how it works by creating a simple tagged template that turns the string to uppercase:


function upper(s) {
return s[0].toUpperCase();
}

And use it:


upper`Hello, World!`
> "HELLO, WORLD!"

As you can see, the first argument to the tagged template function is an array. Luckily for us, the first argument of String.prototype.split is handled in the following way:

All values that are not undefined or objects with a [Symbol.split]() method are coerced to strings.

And coercing an array with one string in it is the same as the string itself:


["hello"].toString()
> "hello"


[" "].toString()
> " "

Since the function we want to call takes a single string argument, we can use it as a tagged template and save the parentheses in the function call.

Let’s write it as a function call instead:


let c6 = (code) => {
const instrs = code.split(' ').flatMap(
(t) => (t > -1 ? [65, t] : 107 + "-*/".indexOf(t))
);
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

Remove the Ternary Operator

Next, let’s undo the Ternary Operator and turn it into an if statement.

The ternary operator has expressions on each branch saving us the return statements. Here’s what the code looks like when we use an if statement instead:


let c7 = (code) => {
const instrs = code.split(" ").flatMap((t) => {
if (t > -1) {
return [65, t];
} else {
return 107 + "-*/".indexOf(t);
}
});
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

Remove Number Check With Coercion

The next trick to undo is the one present twice in the following code:


if (t > -1) {
return [65, t];
}

First we use coercion in t > -1 to check if the token t is a string representing a positive number. Then we use coercion again in [65, t] to let JavaScript turn t into a Number in the Int8Array:


new Int8Array([65, '42'])

The code above evaluates to:


Int8Array(2) [ 65, 42 ]

Let’s write the parsing and checking explicitly:


let c8 = (code) => {
const instrs = code.split(" ").flatMap((t) => {
const num = parseInt(t, 10);
if (Number.isFinite(num)) {
return [65, num];
} else {
return 107 + "-*/".indexOf(t);
}
});
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

The semantics of our compiler change a little bit here. The original version will only accept positive integers as input; if you want a negative number you have to subtract from zero: 0 - 1 to get -1. The new version allows negative numbers since it checks with Number.isFinite(num) instead of t > -1.

Remove indexOf -1 Trick

The next trick is in the else branch:


return 107 + "-*/".indexOf(t);

Our calculator compiler only accepts four arithmetic operations: +, -, *, and /. But in the code above you can only see three: -*/ and a magical number: 107. Here’s how it works — these are the bytecode numbers for arithmetic operations in WebAssembly:

  • +: 106
  • -: 107
  • *: 108
  • /: 109

We only enter this branch if the token t is not a number, which means it can only be one of the arithmetic operators above. So, given a single character which is one of those four operators, we want to produce the appropriate opcode.

We could have written 106 + "+-*/".indexOf(t). That is, we find the symbol’s index in the string:

  • +: 0
  • -: 1
  • *: 2
  • /: 3

…and add 106 to it to get the bytecode number. But when t is not in the string, "+-*/" indexOf returns -1. We can use that to our advantage, and treat -1 to mean “plus or any other token”:

  • +: -1 (any other token will be -1 too)
  • -: 0
  • *: 1
  • /: 2

And that’s why we add 107 instead of 106. Let’s undo the -1 trick:


let c9 = (code) => {
const instrs = code.split(" ").flatMap((t) => {
const num = parseInt(t, 10);
if (Number.isFinite(num)) {
return [65, num];
} else {
return 106 + "+-*/".indexOf(t);
}
});
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

Here again the semantics change a little bit. Before, if the token t wasn’t found, the expression would evaluate to 107 + -1 which would map to an addition. Now it will evaluate to 106 + -1 which will map to bytecode 105 which is the popcnt instruction.

But don’t worry, we’ll fix it in the next step.

Remove indexOf Trick

After explaining how the indexOf trick works and removing the -1 part, let’s go ahead and remove the trick completely. To do it we are going to create an object that maps from an arithmetic operation token to its bytecode:


const OP_TO_BYTECODE = {
"+": 106,
"-": 107,
"*": 108,
"/": 109,
};
let c10 = (code) => {
const instrs = code.split(" ").flatMap((t) => {
const num = parseInt(t, 10);
if (Number.isFinite(num)) {
return [65, num];
} else {
return OP_TO_BYTECODE[t] ?? 106;
}
});
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 4, 1, 0, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

To keep the initial semantics, if the token is not a valid operation we return the bytecode for +: in OP_TO_BYTECODE[t] ?? 106.

Remove the Empty Export Name

From the usage example at the beginning of the post, you may have noticed that the exported function’s name is the empty string:


(await c('11 11 1 - + 4 * 2 /')).instance.exports['']()

We did this to save us the bytes needed to specify the export name, but also to save an extra byte/character in the code because with the length of the export name being 0 we can use the sparse array syntax to leave an empty spot in the WebAssembly module array.

To revert this trick we are going to name the exported function as a, which in UTF-8 is the byte 97:


> new TextEncoder().encode('a')[0]
97


const OP_TO_BYTECODE = {
"+": 106,
"-": 107,
"*": 108,
"/": 109,
};
let c11 = (code) => {
const instrs = code.split(" ").flatMap((t) => {
const num = parseInt(t, 10);
if (Number.isFinite(num)) {
return [65, num];
} else {
return OP_TO_BYTECODE[t] ?? 106;
}
});
const len = instrs.length;
return WebAssembly.instantiate(
new Int8Array([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127, 3, 2, 1, 0, 7, 5, 1, 1, 97, 0, 0,
10, 4 + len, 1, 2 + len, 0, ...instrs, 11
]),
);
};

We can now call it with a nicer name:


(await c11('11 11 1 - + 4 * 2 /')).instance.exports.a()

Implicit Design Decisions

Our initial implementation only supported positive numbers, but that’s not the only number restriction in our compiler.

To keep WebAssembly modules as small as possible, numbers are encoded using a variable-length encoding algorithm called LEB128. You can tell we are not implementing the whole algorithm by looking at the part of the code that encodes numbers: [65,t]. We’re assuming the number being encoded fits in 7 bits, the shortest possible LEB128 representation.

Let’s try the limits of our implementation:


(await c('63')).instance.exports['']();
> 63


(await c('64')).instance.exports['']();
> -64

This means the only numbers that will be parsed correctly are from 0 to 63.


(await c('127')).instance.exports['']();
> -1


(await c('128')).instance.exports['']();

Fails with:

Uncaught CompileError: WebAssembly.instantiate(): Compiling function #0 failed: function body must end with “end” opcode @+33

In the last one we went over the 7 bits and the module was rejected during validation.

Explaining and implementing LEB128 takes a lot of text and code. If you want to read more about it we have a whole deep dive on LEB128 in our book.

A Trick that Almost Worked

During the code golfing phase I had a literal shower thought but sadly it didn’t work.

The idea was to simplify 106 + "+-*/".indexOf(t) by using the UTF-8 character code plus an offset like this: 63 + t.charCodeAt() and saving 3 bytes in the process. The reason it didn’t work is that the characters +-*/ don’t appear in the same order in UTF-8 and WebAssembly bytecode.

Explaining the Numbers in the Array

The last part to expand/explain is the array of numbers used to build the WebAssembly module.

It takes a big part of a specification to explain every byte in the array, but here’s a commented version that should give you a high level idea of what each part does:


[
// Wasm module magic number '\0asm'
[0, 97, 115, 109],
// Wasm version 1.0
[1, 0, 0, 0],
// ----- type section -----
1, // Section identifier
5, // Section size in bytes
1, // Number of entries that follow
// type section - entry 0
96, // Type `function`
0, // Number of parameters
1, // Number of return values
127, // return type i32
// ----- function section -----
3, // Section identifier
2, // Section size in bytes
1, // Number of entries that follow
// function section - entry 0
0, // Index of the type section entry
// ----- export section -----
7, // Section identifier
5, // Section size in bytes
1, // Number of entries that follow
// export section - entry 0
1, // Name size in bytes
97, // String as utf-8 bytes for 'a'
0, // Export type `function`
0, // Function Index
// ----- code section -----
10, // Section identifier
4 + len, // Section size in bytes
1, // Number of entries that follow
// code section - entry 0
2 + len, // Entry size in bytes
0, // Number of local variables
...instrs,
11, // `end` instruction
]

Conclusion

There you go! We’ve turned a rather opaque 192-byte snippet into something that’s almost readable. And in the process, you hopefully learned a little bit about WebAssembly.

If we dropped the size restrictions, there are lots of things we might want to improve in this compiler: handle numbers greater than 127, add nicer syntax, add support for conditionals, loops, etc. If you’re interested in what that might look like, I encourage you to check out our book WebAssembly from the Ground Up. You’ll learn the ins and outs of WebAssembly by writing a real compiler for a simple programming language. It’s a lot of fun!

Special thanks to lexi for contributing some of the tricks used above.