diff --git a/content/posts/better_assembler_and_stack_based_register_file/note.md b/content/posts/better_assembler_and_stack_based_register_file/note.md new file mode 100644 index 0000000..4459bad --- /dev/null +++ b/content/posts/better_assembler_and_stack_based_register_file/note.md @@ -0,0 +1,175 @@ +X-Date: 2024-01-21T21:00:00Z +X-Note-Id: 5605fe4e-a614-4477-9540-62923aea61e8 +Subject: Better assembler and stack-based register file +X-Slug: better_assembler_and_stack_based_register_file + +During the last few weeks, I've made a lot of changes to the programming language I'm working on. +It looks like I'm close to being able to write actually useful programs in it (at least on the level of assembly). +So let's go over the interesting parts. + + +## S-expression-based assembler + +In the previous post, I described the "reader" and "writer": two modules that allow to parse and serialize +data structures in the form of extended S-expressions. Originally, S-expressions are just a way to represent +nested lists containing "atomic" values. My extension allows to represent dictionaries and arrays as well. + +Here's an example such data structure represented as an S-expression: + +``` +;; a list +( +1 2 3 ;; numbers +foobar ;; symbol +{"foo" "bar"} ;; dict with str key +) +``` + +Since my language would be based on Lisp, all programs will be written as sequences of S-expressions. +In Lisp, function calls look like this: + +``` +(function_name arg1 arg2 arg3) +``` + +This would call `function_name` with 3 arguments. Because of how universal this notation is, any program +can be represented this way. For example, this is what factorial implementation looks like: + +``` +(defun factorial (x) + (if (zerop x) + 1 + (* x (factorial (- x 1))))) +``` + +My "reader" can already successfully parse the syntax tree of this program and turn it into a data structure. +I don't yet have a compiler for the high level language that would turn it into bytecode, but I already have +an assembler. + +The only problem with the assembler was that I had to implement a custom parser for it. For example, this is +a program from one of the older posts: + +``` +; Load value '2' into register r0 +li r0, 2 +; Multiply register r0 by 2 and +; put result back to r0 +mul r0, r0, 2 +ret +``` + + +As you can see, it has a custom syntax where a mnemonic needs to be writen first, and then followed by +arguments separated by a comma. Because of the subtleties of this format, it has taken around 700 lines of C +to implement and was very hacky. So I thought to myself: if assembly programs are just a list of opcodes with +arguments, then maybe I can write them as S-expressions as well and save on writing a custom parser? + +And this is what I ended up doing. Now the same program would look like this: + + +``` +; Load value '2' into register r0 +(li r0 2) +; Multiply register r0 by 2 and +; put result back to r0 +(mul r0 r0 2) +(ret) +``` + +And here's an example of evaluating a factorial of 15 and printing it to the screen: + +``` +(const c1 "! is") +(const max 15) +(const one 1) + +(sr 10) + +(mov r0 max) +(mov r3 one) +(mov r2 one) + +(label factorial) +(mul r2 r2 r3) +(add r3 r3 one) +(jle r3 r0 factorial) + +(label end) + +(lfi r5 print) +(mov r6 r0) +(mov r7 c1) +(mov r8 r2) + +(call r5 3) +(retnil) +``` + + +## Stack-based register file + +Initially, I designed the virual machine loosely after a RISC-V architecture. There were few opcodes, a stack, +and 16 general-purpose registers. Of course, since the virtual machine is not real hardware and is targeting +a dynamic language - it means that the registers and stack were tagged with a type identifier of an object +they point to. But otherwise the design was quite close. + +Eventually, when I started thinking how I'm going to compile actual high-level code to the virtual machine +bytecode, I've found a problem called "register allocation". Usually, when you compile a C or C++ program, +you have local variables in your functions. Those variables live on a stack, and are loaded into registers +when your program performs computation on them (say, adding two variables). Because operating with registers +is very fast, and loading data from a stack is a lot slower, compilers have an intricate optimizer that would +try to use registers to store and access the intermediate results as much as possible. + +On one hand, I don't want to write a complex register allocation optimizer. On the other hand, I don't want +to write the virtual machine purely as a stack machine. Mostly because stack machines shuffle around values +too much, and it's very hard to read their disassembly because of this. Try reading Python disassembly, and +you'll see what I mean. + +After thinking about this for a few evenings, I had a thought: if in my virtual machine there's no difference +in performance between accessing a register and a value on the stack - maybe I can put registers on a separate +stack? In this case, I can have essentially unlimited number of registers available for a function to work with, +and it will have its own set of registers that are independent from the registers of the function that called it. + +Excited, I did a bit of googling to see if there are any other runtimes that did this, and sure enough some did! +This was the case for CLR (.NET runtime) and Lua virtual machine. I didn't want to dive into CLR, because it +was too big, but [this Lua 5 paper](https://www.lua.org/doc/jucs05.pdf) was short and lovely, and described +everything I needed. + +Changing the approach to accessing registers wasn't easy, and took almost a whole day of refactoring. But after +finishing that, I can write things like this: + +``` +(const c1 42) +(const c2 5) +(const c3 47) +(sr 3) + +;; Call a function that +;; adds two numbers and +;; puts result back to r0 +(setjump r0 add2) +(mov r1 c1) +(mov r2 c2) +(call r0 2) + +;; Check that r0 == 47 +(aeq r0 c3) + +(retnil) + +;; Function that adds +;; two numbers +(label add2) +(sr 3) +;; These registers are +;; not the same as the +;; caller's +(add r2 r0 r1) +(ret r2) +``` + +In this example you see only a few registers used, like `r0`, `r1` and `r2`. But the current implementation +allows you to go all the way up to `r65536`, which should be plenty for addressing local variables of pretty +much any program. + +Now, with this freedom of using large range of registers, writing a compiler would be a lot simpler.