Publis a post on assembly and stack-based registers

This commit is contained in:
Konstantin Nazarov 2024-01-21 21:00:33 +00:00
parent a169ee4c61
commit e3dca542e9
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,175 @@
X-Date: 2024-01-21T21:00:00Z
X-Note-Id: 5605fe4e-a614-4477-9540-62923aea61e8
Subject: Better assembler and stack-based register file
X-Slug: better_assembler_and_stack_based_register_file
During the last few weeks, I've made a lot of changes to the programming language I'm working on.
It looks like I'm close to being able to write actually useful programs in it (at least on the level of assembly).
So let's go over the interesting parts.
## S-expression-based assembler
In the previous post, I described the "reader" and "writer": two modules that allow to parse and serialize
data structures in the form of extended S-expressions. Originally, S-expressions are just a way to represent
nested lists containing "atomic" values. My extension allows to represent dictionaries and arrays as well.
Here's an example such data structure represented as an S-expression:
```
;; a list
(
1 2 3 ;; numbers
foobar ;; symbol
{"foo" "bar"} ;; dict with str key
)
```
Since my language would be based on Lisp, all programs will be written as sequences of S-expressions.
In Lisp, function calls look like this:
```
(function_name arg1 arg2 arg3)
```
This would call `function_name` with 3 arguments. Because of how universal this notation is, any program
can be represented this way. For example, this is what factorial implementation looks like:
```
(defun factorial (x)
(if (zerop x)
1
(* x (factorial (- x 1)))))
```
My "reader" can already successfully parse the syntax tree of this program and turn it into a data structure.
I don't yet have a compiler for the high level language that would turn it into bytecode, but I already have
an assembler.
The only problem with the assembler was that I had to implement a custom parser for it. For example, this is
a program from one of the older posts:
```
; Load value '2' into register r0
li r0, 2
; Multiply register r0 by 2 and
; put result back to r0
mul r0, r0, 2
ret
```
As you can see, it has a custom syntax where a mnemonic needs to be writen first, and then followed by
arguments separated by a comma. Because of the subtleties of this format, it has taken around 700 lines of C
to implement and was very hacky. So I thought to myself: if assembly programs are just a list of opcodes with
arguments, then maybe I can write them as S-expressions as well and save on writing a custom parser?
And this is what I ended up doing. Now the same program would look like this:
```
; Load value '2' into register r0
(li r0 2)
; Multiply register r0 by 2 and
; put result back to r0
(mul r0 r0 2)
(ret)
```
And here's an example of evaluating a factorial of 15 and printing it to the screen:
```
(const c1 "! is")
(const max 15)
(const one 1)
(sr 10)
(mov r0 max)
(mov r3 one)
(mov r2 one)
(label factorial)
(mul r2 r2 r3)
(add r3 r3 one)
(jle r3 r0 factorial)
(label end)
(lfi r5 print)
(mov r6 r0)
(mov r7 c1)
(mov r8 r2)
(call r5 3)
(retnil)
```
## Stack-based register file
Initially, I designed the virual machine loosely after a RISC-V architecture. There were few opcodes, a stack,
and 16 general-purpose registers. Of course, since the virtual machine is not real hardware and is targeting
a dynamic language - it means that the registers and stack were tagged with a type identifier of an object
they point to. But otherwise the design was quite close.
Eventually, when I started thinking how I'm going to compile actual high-level code to the virtual machine
bytecode, I've found a problem called "register allocation". Usually, when you compile a C or C++ program,
you have local variables in your functions. Those variables live on a stack, and are loaded into registers
when your program performs computation on them (say, adding two variables). Because operating with registers
is very fast, and loading data from a stack is a lot slower, compilers have an intricate optimizer that would
try to use registers to store and access the intermediate results as much as possible.
On one hand, I don't want to write a complex register allocation optimizer. On the other hand, I don't want
to write the virtual machine purely as a stack machine. Mostly because stack machines shuffle around values
too much, and it's very hard to read their disassembly because of this. Try reading Python disassembly, and
you'll see what I mean.
After thinking about this for a few evenings, I had a thought: if in my virtual machine there's no difference
in performance between accessing a register and a value on the stack - maybe I can put registers on a separate
stack? In this case, I can have essentially unlimited number of registers available for a function to work with,
and it will have its own set of registers that are independent from the registers of the function that called it.
Excited, I did a bit of googling to see if there are any other runtimes that did this, and sure enough some did!
This was the case for CLR (.NET runtime) and Lua virtual machine. I didn't want to dive into CLR, because it
was too big, but [this Lua 5 paper](https://www.lua.org/doc/jucs05.pdf) was short and lovely, and described
everything I needed.
Changing the approach to accessing registers wasn't easy, and took almost a whole day of refactoring. But after
finishing that, I can write things like this:
```
(const c1 42)
(const c2 5)
(const c3 47)
(sr 3)
;; Call a function that
;; adds two numbers and
;; puts result back to r0
(setjump r0 add2)
(mov r1 c1)
(mov r2 c2)
(call r0 2)
;; Check that r0 == 47
(aeq r0 c3)
(retnil)
;; Function that adds
;; two numbers
(label add2)
(sr 3)
;; These registers are
;; not the same as the
;; caller's
(add r2 r0 r1)
(ret r2)
```
In this example you see only a few registers used, like `r0`, `r1` and `r2`. But the current implementation
allows you to go all the way up to `r65536`, which should be plenty for addressing local variables of pretty
much any program.
Now, with this freedom of using large range of registers, writing a compiler would be a lot simpler.