Add a post about hashes and function calls

This commit is contained in:
Konstantin Nazarov 2023-10-16 00:41:52 +01:00
parent d43af27eb2
commit 61d149acd9
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,83 @@
X-Date: 2023-10-16T00:50:00Z
X-Note-Id: 9558177b-58e2-442f-ad96-a0c6b5a70407
Subject: VM progress update: hashes and function calls
X-Slug: vm_progress_update_hashes_and_function_calls
Today I'd like to tell you about 2 new opcodes I've added to the virtual machine.
The first is for calculating hashes, and the second is for calling variadic functions.
Just as a reminder, I'm designing my virtual machine to be very high-level. Even though
it's a register VM, it contains opcodes for many typical data structures such as lists,
arrays, strings, slices and others. Ideally, generating bytecode for such machine should
be trivial in any language.
Having all data structures defined on the level of the virtual machine and assembly is
important, because the VM will make heavy use of frozen data structures. Frozen data
structures can be loaded from disk or sent through network, and be accessed without
deserialization. To support frozen data structures, every piece of the VM needs to understand
their memory layout - starting from the bytecode evaluator and up to the garbage collector
and FFI. And since there is a frozen couterpart for every standard data type, it means that
all such basic data types would be "fixed" in the implementation.
Now I'm working on an implementation of a hash table and a hash set, which should conclude
the list of basic data structures. They turned out to be a little bit trickier than the rest.
One of the properties I would like to have in addition to freezing is total order. Which means
that every frozen data structure including a hash table should be comparable for equality/inequality
and greater/less. This means that hash tables must have predictable layout. This is not true for
many implementations of hash tables in existing programming languages.
The reason for that is how they deal with handling collisions. If one uses open addressing, or
just use lists to store elements that have the same key, the order in which you insert items
would matter: keys will be in different order when you compare two hash tables.
One of the solutions to this is how C++ handles the `map` class: it's implemented as a red-black
tree which has predictable rebalancing rules. Independently of which order you use to insert
elements, you'd always end up with the same layout and order of keys.
I'm gradually leaning towards either a hash-trie or a list-based hash table where the colliding
leafs would be sorted. This would be a little suboptimal speed-wise, but should be fine for now.
Anyhow, while I'm working on the hash table which I can't yet demonstrate, here are two new opcodes.
## `hr` - opcode for calculating a hash value
This opcode has two parameters that are passed as registers. The first is where the result will
be stored, and the second contains an object for which hash value would be computed. It can be
an arbitrarily complex nested data structure without loops.
## `jafi` - calls a variadic procedure with specific number of parameters
The first register passed to this opcode is the address of the procedure/closure, and the immediate
parameter is the count of arguments. Currently you can pass up to 12 arguments in the registers
as per convention. But as soon as I extend this, you'd be able to pass the rest through the stack.
Usually, the functions won't have that many arguments, so mostly you'd just have the register parameters.
## Example usage
This is a simple example that demonstrates the new opcodes:
```
;; Loads a constant string to register r1
loadc r1, "foobar"
;; Calculates a hash of the string and
;; stores it in r0
hr r0, r1
;; Loads the address of the "print"
;; function in register r5
lfi r5, print
;; Calls function that is stored in r5
;; with 1 argument (r0)
jafi r5, 1
;; End the execution by returning
;; from the top level
ret
```
As usual, all assembler instructions are turned into 32-bit opcodes and serialized to
a frozen uint32 array. This array would then get loaded by the virtual machine and
executed.