diff --git a/content/posts/better_assembler_and_stack_based_register_file/note.md b/content/posts/better_assembler_and_stack_based_register_file/note.md
new file mode 100644
index 0000000..4459bad
--- /dev/null
+++ b/content/posts/better_assembler_and_stack_based_register_file/note.md
@@ -0,0 +1,175 @@
+X-Date: 2024-01-21T21:00:00Z
+X-Note-Id: 5605fe4e-a614-4477-9540-62923aea61e8
+Subject: Better assembler and stack-based register file
+X-Slug: better_assembler_and_stack_based_register_file
+
+During the last few weeks, I've made a lot of changes to the programming language I'm working on.
+It looks like I'm close to being able to write actually useful programs in it (at least on the level of assembly).
+So let's go over the interesting parts.
+
+
+## S-expression-based assembler
+
+In the previous post, I described the "reader" and "writer": two modules that allow to parse and serialize
+data structures in the form of extended S-expressions. Originally, S-expressions are just a way to represent
+nested lists containing "atomic" values. My extension allows to represent dictionaries and arrays as well.
+
+Here's an example such data structure represented as an S-expression:
+
+```
+;; a list
+(
+1 2 3 ;; numbers
+foobar ;; symbol
+{"foo" "bar"} ;; dict with str key
+)
+```
+
+Since my language would be based on Lisp, all programs will be written as sequences of S-expressions.
+In Lisp, function calls look like this:
+
+```
+(function_name arg1 arg2 arg3)
+```
+
+This would call `function_name` with 3 arguments. Because of how universal this notation is, any program
+can be represented this way. For example, this is what factorial implementation looks like:
+
+```
+(defun factorial (x)
+   (if (zerop x)
+       1
+       (* x (factorial (- x 1)))))
+```
+
+My "reader" can already successfully parse the syntax tree of this program and turn it into a data structure.
+I don't yet have a compiler for the high level language that would turn it into bytecode, but I already have
+an assembler.
+
+The only problem with the assembler was that I had to implement a custom parser for it. For example, this is
+a program from one of the older posts:
+
+```
+; Load value '2' into register r0
+li r0, 2
+; Multiply register r0 by 2 and
+; put result back to r0
+mul r0, r0, 2
+ret
+```
+
+
+As you can see, it has a custom syntax where a mnemonic needs to be writen first, and then followed by
+arguments separated by a comma. Because of the subtleties of this format, it has taken around 700 lines of C
+to implement and was very hacky. So I thought to myself: if assembly programs are just a list of opcodes with
+arguments, then maybe I can write them as S-expressions as well and save on writing a custom parser?
+
+And this is what I ended up doing. Now the same program would look like this:
+
+
+```
+; Load value '2' into register r0
+(li r0 2)
+; Multiply register r0 by 2 and
+; put result back to r0
+(mul r0 r0 2)
+(ret)
+```
+
+And here's an example of evaluating a factorial of 15 and printing it to the screen:
+
+```
+(const c1 "! is")
+(const max 15)
+(const one 1)
+
+(sr 10)
+
+(mov r0 max)
+(mov r3 one)
+(mov r2 one)
+
+(label factorial)
+(mul r2 r2 r3)
+(add r3 r3 one)
+(jle r3 r0 factorial)
+
+(label end)
+
+(lfi r5 print)
+(mov r6 r0)
+(mov r7 c1)
+(mov r8 r2)
+
+(call r5 3)
+(retnil)
+```
+
+
+## Stack-based register file
+
+Initially, I designed the virual machine loosely after a RISC-V architecture. There were few opcodes, a stack,
+and 16 general-purpose registers. Of course, since the virtual machine is not real hardware and is targeting
+a dynamic language - it means that the registers and stack were tagged with a type identifier of an object
+they point to. But otherwise the design was quite close.
+
+Eventually, when I started thinking how I'm going to compile actual high-level code to the virtual machine
+bytecode, I've found a problem called "register allocation". Usually, when you compile a C or C++ program,
+you have local variables in your functions. Those variables live on a stack, and are loaded into registers
+when your program performs computation on them (say, adding two variables). Because operating with registers
+is very fast, and loading data from a stack is a lot slower, compilers have an intricate optimizer that would
+try to use registers to store and access the intermediate results as much as possible.
+
+On one hand, I don't want to write a complex register allocation optimizer. On the other hand, I don't want
+to write the virtual machine purely as a stack machine. Mostly because stack machines shuffle around values
+too much, and it's very hard to read their disassembly because of this. Try reading Python disassembly, and
+you'll see what I mean.
+
+After thinking about this for a few evenings, I had a thought: if in my virtual machine there's no difference
+in performance between accessing a register and a value on the stack - maybe I can put registers on a separate
+stack? In this case, I can have essentially unlimited number of registers available for a function to work with,
+and it will have its own set of registers that are independent from the registers of the function that called it.
+
+Excited, I did a bit of googling to see if there are any other runtimes that did this, and sure enough some did!
+This was the case for CLR (.NET runtime) and Lua virtual machine. I didn't want to dive into CLR, because it
+was too big, but [this Lua 5 paper](https://www.lua.org/doc/jucs05.pdf) was short and lovely, and described
+everything I needed.
+
+Changing the approach to accessing registers wasn't easy, and took almost a whole day of refactoring. But after
+finishing that, I can write things like this:
+
+```
+(const c1 42)
+(const c2 5)
+(const c3 47)
+(sr 3)
+
+;; Call a function that
+;; adds two numbers and
+;; puts result back to r0
+(setjump r0 add2)
+(mov r1 c1)
+(mov r2 c2)
+(call r0 2)
+
+;; Check that r0 == 47
+(aeq r0 c3)
+
+(retnil)
+
+;; Function that adds
+;; two numbers
+(label add2)
+(sr 3)
+;; These registers are
+;; not the same as the
+;; caller's
+(add r2 r0 r1)
+(ret r2)
+```
+
+In this example you see only a few registers used, like `r0`, `r1` and `r2`. But the current implementation
+allows you to go all the way up to `r65536`, which should be plenty for addressing local variables of pretty
+much any program.
+
+Now, with this freedom of using large range of registers, writing a compiler would be a lot simpler.