From d008ba3c23d88cf88b8693c5d2eb4e8a508b5c5e Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sun, 24 Sep 2023 21:46:14 +0100 Subject: [PATCH] Publish a post about the vm data segment --- .../note.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 content/posts/vm_progress_update_data_segment_and_constants/note.md diff --git a/content/posts/vm_progress_update_data_segment_and_constants/note.md b/content/posts/vm_progress_update_data_segment_and_constants/note.md new file mode 100644 index 0000000..36b3cb1 --- /dev/null +++ b/content/posts/vm_progress_update_data_segment_and_constants/note.md @@ -0,0 +1,64 @@ +X-Date: 2023-09-24T21:50:00Z +X-Note-Id: 3a07c153-0d26-402e-b0b4-626dfdcf38fc +Subject: VM progress update: data segment and constants +X-Slug: vm_progress_update_data_segment_and_constants + +Today I'd like to introduce another addition to the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental) +I'm working on: the data segment. + +Let's first take a look at why this is necessary. Here's a piece of assembly code that multiplies 2 by 2: + +``` +; Load value '2' into register r0 +li r0, 2 +; Multiply register r0 by 2 and +; put result back to r0 +mul r0, r0, 2 +ret +``` + +There instruction here that allows to load a constant into a register is `li` which stands for "load immediate". +The value that it loads into the register is encoded as part of the instruction itself. Since the size of the +instruction is 32 bit, the size of the instruction code is 7 bit, and the size of the register number is 4 bits, it +leaves 21 bits for the value. 21 bits is only enough to encode a range of values between -1048576 and 1048575. + +Representing such a short range is OK in practice in many pieces of code that deal with offsets. You can have +larger ranges, but with a high likelihood, most of the values in your code will be quite short (think loop counters). + +But if you have a larger value, you need to be able to represent it somehow. One way to do that is through bitwise +operations: load 21 bits at a time, and then bit-shift and bit-or until you get the desired value. It would work +for integers, and won't require any additional operations. Though, it will take a lot of instructions to do the +same work that one instruction should have accomplished. + +Another problem with just using bit-shifts is that you can't easily encode strings and other more complex data +structures this way. Any more complex constant will require executing lots of sequential instructions to reconstruct +a particular data type. + +For this reason, regular executables contain both a code and data section. Whenever an executable is loaded into +memory by a kernel, the sections get mapped into memory at predictable addresses, and the code can just load +a constant either from the absolute address or from an offset relative to the instruction pointer. + +For my virtual machine, I don't want to mix the code and data as part of one logical memory block. This is because +I would like to minimize the chance of memory corruptions, and thus arbitrary memory access in the VM is not possible. +What I have instead is a representation of code as a pair of values: an array of instructions, and an array of +constants. This pair can be predictably serialized to a file, and then loaded into memory later. + +To make it possible to load constants from the constant array, I've added a new instruction called `loadc` where you +pass a register and an index to the array. When executed, the instruction will load the value of the constant to the +specified register. + +Here's what its usage looks like (note the u8 suffix is just a way to tell the compiler that this is a 8-bit unsigned +integer): + +``` +loadc r0, 255u8 +loadc r1, 2u8 +add r0, r0, r1 ; r0 would contain "1" +ret +``` + +Also note, that when writing the assembly code, you don't need to fill in the constant array yourself. The compiler does +that for you. You just specify the constant as a second argument, and during compilation the compiler would move +the constant to the array and insert the opcode with the correct index in place of the `loadc` instruction. + +Right now there is only support for integers, but adding other data types should be relatively easy.