Publish a post about the vm data segment

2023-09-24 21:46:14 +01:00 · 2023-09-24 21:46:14 +01:00 · d008ba3c23
commit d008ba3c23
parent 0043a54b30
1 changed files with 64 additions and 0 deletions
--- a/content/posts/vm_progress_update_data_segment_and_constants/note.md
+++ b/content/posts/vm_progress_update_data_segment_and_constants/note.md
@ -0,0 +1,64 @@
 X-Date: 2023-09-24T21:50:00Z
 X-Note-Id: 3a07c153-0d26-402e-b0b4-626dfdcf38fc
 Subject: VM progress update: data segment and constants
 X-Slug: vm_progress_update_data_segment_and_constants
 Today I'd like to introduce another addition to the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental)
 I'm working on: the data segment.
 Let's first take a look at why this is necessary. Here's a piece of assembly code that multiplies 2 by 2:
 ```
 ; Load value '2' into register r0
 li r0, 2
 ; Multiply register r0 by 2 and
 ; put result back to r0
 mul r0, r0, 2
 ret
 ```
 There instruction here that allows to load a constant into a register is `li` which stands for "load immediate".
 The value that it loads into the register is encoded as part of the instruction itself. Since the size of the
 instruction is 32 bit, the size of the instruction code is 7 bit, and the size of the register number is 4 bits, it
 leaves 21 bits for the value. 21 bits is only enough to encode a range of values between -1048576 and 1048575.
 Representing such a short range is OK in practice in many pieces of code that deal with offsets. You can have
 larger ranges, but with a high likelihood, most of the values in your code will be quite short (think loop counters).
 But if you have a larger value, you need to be able to represent it somehow. One way to do that is through bitwise
 operations: load 21 bits at a time, and then bit-shift and bit-or until you get the desired value. It would work
 for integers, and won't require any additional operations. Though, it will take a lot of instructions to do the
 same work that one instruction should have accomplished.
 Another problem with just using bit-shifts is that you can't easily encode strings and other more complex data
 structures this way. Any more complex constant will require executing lots of sequential instructions to reconstruct
 a particular data type.
 For this reason, regular executables contain both a code and data section. Whenever an executable is loaded into
 memory by a kernel, the sections get mapped into memory at predictable addresses, and the code can just load
 a constant either from the absolute address or from an offset relative to the instruction pointer.
 For my virtual machine, I don't want to mix the code and data as part of one logical memory block. This is because
 I would like to minimize the chance of memory corruptions, and thus arbitrary memory access in the VM is not possible.
 What I have instead is a representation of code as a pair of values: an array of instructions, and an array of
 constants. This pair can be predictably serialized to a file, and then loaded into memory later.
 To make it possible to load constants from the constant array, I've added a new instruction called `loadc` where you
 pass a register and an index to the array. When executed, the instruction will load the value of the constant to the
 specified register.
 Here's what its usage looks like (note the u8 suffix is just a way to tell the compiler that this is a 8-bit unsigned
 integer):
 ```
 loadc r0, 255u8
 loadc r1, 2u8
 add r0, r0, r1 ; r0 would contain "1"
 ret
 ```
 Also note, that when writing the assembly code, you don't need to fill in the constant array yourself. The compiler does
 that for you. You just specify the constant as a second argument, and during compilation the compiler would move
 the constant to the array and insert the opcode with the correct index in place of the `loadc` instruction.
 Right now there is only support for integers, but adding other data types should be relatively easy.