From d008ba3c23d88cf88b8693c5d2eb4e8a508b5c5e Mon Sep 17 00:00:00 2001
From: Konstantin Nazarov <mail@knazarov.com>
Date: Sun, 24 Sep 2023 21:46:14 +0100
Subject: [PATCH] Publish a post about the vm data segment

---
 .../note.md                                   | 64 +++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 content/posts/vm_progress_update_data_segment_and_constants/note.md

diff --git a/content/posts/vm_progress_update_data_segment_and_constants/note.md b/content/posts/vm_progress_update_data_segment_and_constants/note.md
new file mode 100644
index 0000000..36b3cb1
--- /dev/null
+++ b/content/posts/vm_progress_update_data_segment_and_constants/note.md
@@ -0,0 +1,64 @@
+X-Date: 2023-09-24T21:50:00Z
+X-Note-Id: 3a07c153-0d26-402e-b0b4-626dfdcf38fc
+Subject: VM progress update: data segment and constants
+X-Slug: vm_progress_update_data_segment_and_constants
+
+Today I'd like to introduce another addition to the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental)
+I'm working on: the data segment.
+
+Let's first take a look at why this is necessary. Here's a piece of assembly code that multiplies 2 by 2:
+
+```
+; Load value '2' into register r0
+li r0, 2
+; Multiply register r0 by 2 and
+; put result back to r0
+mul r0, r0, 2
+ret
+```
+
+There instruction here that allows to load a constant into a register is `li` which stands for "load immediate".
+The value that it loads into the register is encoded as part of the instruction itself. Since the size of the
+instruction is 32 bit, the size of the instruction code is 7 bit, and the size of the register number is 4 bits, it
+leaves 21 bits for the value. 21 bits is only enough to encode a range of values between -1048576 and 1048575.
+
+Representing such a short range is OK in practice in many pieces of code that deal with offsets. You can have
+larger ranges, but with a high likelihood, most of the values in your code will be quite short (think loop counters).
+
+But if you have a larger value, you need to be able to represent it somehow. One way to do that is through bitwise
+operations: load 21 bits at a time, and then bit-shift and bit-or until you get the desired value. It would work
+for integers, and won't require any additional operations. Though, it will take a lot of instructions to do the
+same work that one instruction should have accomplished.
+
+Another problem with just using bit-shifts is that you can't easily encode strings and other more complex data
+structures this way. Any more complex constant will require executing lots of sequential instructions to reconstruct
+a particular data type.
+
+For this reason, regular executables contain both a code and data section. Whenever an executable is loaded into
+memory by a kernel, the sections get mapped into memory at predictable addresses, and the code can just load
+a constant either from the absolute address or from an offset relative to the instruction pointer.
+
+For my virtual machine, I don't want to mix the code and data as part of one logical memory block. This is because
+I would like to minimize the chance of memory corruptions, and thus arbitrary memory access in the VM is not possible.
+What I have instead is a representation of code as a pair of values: an array of instructions, and an array of
+constants. This pair can be predictably serialized to a file, and then loaded into memory later.
+
+To make it possible to load constants from the constant array, I've added a new instruction called `loadc` where you
+pass a register and an index to the array. When executed, the instruction will load the value of the constant to the
+specified register.
+
+Here's what its usage looks like (note the u8 suffix is just a way to tell the compiler that this is a 8-bit unsigned
+integer):
+
+```
+loadc r0, 255u8
+loadc r1, 2u8
+add r0, r0, r1 ; r0 would contain "1"
+ret
+```
+
+Also note, that when writing the assembly code, you don't need to fill in the constant array yourself. The compiler does
+that for you. You just specify the constant as a second argument, and during compilation the compiler would move
+the constant to the array and insert the opcode with the correct index in place of the `loadc` instruction.
+
+Right now there is only support for integers, but adding other data types should be relatively easy.