Add a post about a dynamic language VM
This commit is contained in:
parent
bb0f078035
commit
3ef359eeb6
1 changed files with 92 additions and 0 deletions
92
content/posts/simple_dynamic_language_vm/note.md
Normal file
92
content/posts/simple_dynamic_language_vm/note.md
Normal file
|
@ -0,0 +1,92 @@
|
|||
X-Date: 2023-07-28T21:00:00Z
|
||||
X-Note-Id: 67c55b15-462a-4bfd-8b6c-277535615938
|
||||
Subject: Simple VM for dynamic languages
|
||||
X-Slug: simple_dynamic_language_vm
|
||||
|
||||
A few weeks ago, I've started working on a lisp interpreter. I already did a few implementations of lisp in different
|
||||
languages, but those were mostly just recursive evaluators. This time, it's a bit more serious.
|
||||
|
||||
Instead of writing the implementation top-to-bottom, I started with a virtual machine. Virtual machines are used to
|
||||
execute most of the scripting languages, since bytecode is more compact and faster to evaluate. Compared to a tree-walker,
|
||||
bytecode VMs are better for branch prediction and friendlier to the CPU cache.
|
||||
|
||||
Because the VM is very barebones at the moment, and no language is written on top of it, I created a very simple assembly
|
||||
language. Here's an example of computing a factorial function in it:
|
||||
|
||||
```
|
||||
li r1, 10
|
||||
li r2, 1
|
||||
li r0, 1
|
||||
|
||||
factorial:
|
||||
mul r0, r0, r2
|
||||
addi r2, r2, 1
|
||||
jle r2, r1, factorial
|
||||
```
|
||||
|
||||
This code computes `10!` and returns it in the `r0` register.
|
||||
|
||||
The architecture of the VM is "load/store", meaning that computation (addition, multiplication, conditions, etc...) can only
|
||||
be performed on registers. Data can be loaded from the memory to registers with separate instructions. This contrasts a bit
|
||||
with how some VMs are implemented: for some reason many of them don't use registers at all, and instead rely only on the stack.
|
||||
For example, this is what a factorial function would look like in Python bytecode:
|
||||
|
||||
```
|
||||
2 >> 0 LOAD_FAST 0 (N)
|
||||
3 LOAD_CONST 1 (1)
|
||||
6 COMPARE_OP 2 (==)
|
||||
9 POP_JUMP_IF_FALSE 16
|
||||
|
||||
3 12 LOAD_FAST 1 (result)
|
||||
15 RETURN_VALUE
|
||||
|
||||
4 >> 16 LOAD_FAST 0 (N)
|
||||
19 LOAD_CONST 1 (1)
|
||||
22 BINARY_SUBTRACT
|
||||
23 LOAD_FAST 0 (N)
|
||||
26 LOAD_FAST 1 (result)
|
||||
29 BINARY_MULTIPLY
|
||||
30 STORE_FAST 1 (result)
|
||||
33 STORE_FAST 0 (N)
|
||||
36 JUMP_ABSOLUTE 0
|
||||
```
|
||||
|
||||
If you look carefully, you'd notice that there are no registers here. This is because all operations that write something,
|
||||
usually do so to the top of the stack.
|
||||
|
||||
There are a few problems I find with the stack machines:
|
||||
|
||||
- It is unnatural to read the disassembly (you have to keep track of changing stack offsets instead of register names)
|
||||
- Either instructions waste space, or we deal with variable-widths instructions (as is the case for Python)
|
||||
- Some potential for optimizations is wasted
|
||||
|
||||
In my personal opinion, a good virtual machine for a dynamic language should be also a suitable target for compiling regular
|
||||
expression state machines.
|
||||
|
||||
So instead, I opted for a more traditional approach, that is similar to a RISC CPU:
|
||||
|
||||
- 32-bit constant-width instructions
|
||||
- flexible stack
|
||||
- 32 registers, most of which are general-purpose, except for frame pointer/instruction pointer/etc...
|
||||
- 64-bit width for both register and stack entries
|
||||
|
||||
The only twist that I've added compared to the "normal" CPUs is register and stack tagging. With physical processors,
|
||||
it is often the case that software is written in strongly-typed languages, where data types are known during compile time,
|
||||
and thus the compiler can generate specific instructions for handling, say, `int32` vs `int64`.
|
||||
|
||||
Consider the following code:
|
||||
|
||||
```
|
||||
mul r3, r2, r1
|
||||
```
|
||||
|
||||
It is essentially equivalent to `r3 = r2 * r1`. But what types do `r1` and `r2` have? Well, in case of my virtual machine,
|
||||
registers and stack entries "know" their type. So if you attempt to multiply an `int32` with `uint64`, you'd get standard
|
||||
type promotion, and the result would be tagged as `uint64`. Because of this, you don't have to perform checks on the bytecode
|
||||
level.
|
||||
|
||||
So, how fast is this approach? In my preliminary tests, a simple loop from 1 to 100 million, with a multiplication inside,
|
||||
takes 0.7 seconds to complete. Which is plenty fast, considering that the VM implementation is naive and has never been
|
||||
seriously optimized.
|
||||
|
||||
The code for the experimental version can be found [here](https://git.sr.ht/~knazarov/lisp.experimental).
|
Loading…
Reference in a new issue