Publish a post about the vm data segment
This commit is contained in:
parent
0043a54b30
commit
d008ba3c23
1 changed files with 64 additions and 0 deletions
|
@ -0,0 +1,64 @@
|
||||||
|
X-Date: 2023-09-24T21:50:00Z
|
||||||
|
X-Note-Id: 3a07c153-0d26-402e-b0b4-626dfdcf38fc
|
||||||
|
Subject: VM progress update: data segment and constants
|
||||||
|
X-Slug: vm_progress_update_data_segment_and_constants
|
||||||
|
|
||||||
|
Today I'd like to introduce another addition to the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental)
|
||||||
|
I'm working on: the data segment.
|
||||||
|
|
||||||
|
Let's first take a look at why this is necessary. Here's a piece of assembly code that multiplies 2 by 2:
|
||||||
|
|
||||||
|
```
|
||||||
|
; Load value '2' into register r0
|
||||||
|
li r0, 2
|
||||||
|
; Multiply register r0 by 2 and
|
||||||
|
; put result back to r0
|
||||||
|
mul r0, r0, 2
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
There instruction here that allows to load a constant into a register is `li` which stands for "load immediate".
|
||||||
|
The value that it loads into the register is encoded as part of the instruction itself. Since the size of the
|
||||||
|
instruction is 32 bit, the size of the instruction code is 7 bit, and the size of the register number is 4 bits, it
|
||||||
|
leaves 21 bits for the value. 21 bits is only enough to encode a range of values between -1048576 and 1048575.
|
||||||
|
|
||||||
|
Representing such a short range is OK in practice in many pieces of code that deal with offsets. You can have
|
||||||
|
larger ranges, but with a high likelihood, most of the values in your code will be quite short (think loop counters).
|
||||||
|
|
||||||
|
But if you have a larger value, you need to be able to represent it somehow. One way to do that is through bitwise
|
||||||
|
operations: load 21 bits at a time, and then bit-shift and bit-or until you get the desired value. It would work
|
||||||
|
for integers, and won't require any additional operations. Though, it will take a lot of instructions to do the
|
||||||
|
same work that one instruction should have accomplished.
|
||||||
|
|
||||||
|
Another problem with just using bit-shifts is that you can't easily encode strings and other more complex data
|
||||||
|
structures this way. Any more complex constant will require executing lots of sequential instructions to reconstruct
|
||||||
|
a particular data type.
|
||||||
|
|
||||||
|
For this reason, regular executables contain both a code and data section. Whenever an executable is loaded into
|
||||||
|
memory by a kernel, the sections get mapped into memory at predictable addresses, and the code can just load
|
||||||
|
a constant either from the absolute address or from an offset relative to the instruction pointer.
|
||||||
|
|
||||||
|
For my virtual machine, I don't want to mix the code and data as part of one logical memory block. This is because
|
||||||
|
I would like to minimize the chance of memory corruptions, and thus arbitrary memory access in the VM is not possible.
|
||||||
|
What I have instead is a representation of code as a pair of values: an array of instructions, and an array of
|
||||||
|
constants. This pair can be predictably serialized to a file, and then loaded into memory later.
|
||||||
|
|
||||||
|
To make it possible to load constants from the constant array, I've added a new instruction called `loadc` where you
|
||||||
|
pass a register and an index to the array. When executed, the instruction will load the value of the constant to the
|
||||||
|
specified register.
|
||||||
|
|
||||||
|
Here's what its usage looks like (note the u8 suffix is just a way to tell the compiler that this is a 8-bit unsigned
|
||||||
|
integer):
|
||||||
|
|
||||||
|
```
|
||||||
|
loadc r0, 255u8
|
||||||
|
loadc r1, 2u8
|
||||||
|
add r0, r0, r1 ; r0 would contain "1"
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
Also note, that when writing the assembly code, you don't need to fill in the constant array yourself. The compiler does
|
||||||
|
that for you. You just specify the constant as a second argument, and during compilation the compiler would move
|
||||||
|
the constant to the array and insert the opcode with the correct index in place of the `loadc` instruction.
|
||||||
|
|
||||||
|
Right now there is only support for integers, but adding other data types should be relatively easy.
|
Loading…
Reference in a new issue