Add a post about hashes and function calls
This commit is contained in:
parent
d43af27eb2
commit
61d149acd9
1 changed files with 83 additions and 0 deletions
|
@ -0,0 +1,83 @@
|
||||||
|
X-Date: 2023-10-16T00:50:00Z
|
||||||
|
X-Note-Id: 9558177b-58e2-442f-ad96-a0c6b5a70407
|
||||||
|
Subject: VM progress update: hashes and function calls
|
||||||
|
X-Slug: vm_progress_update_hashes_and_function_calls
|
||||||
|
|
||||||
|
Today I'd like to tell you about 2 new opcodes I've added to the virtual machine.
|
||||||
|
The first is for calculating hashes, and the second is for calling variadic functions.
|
||||||
|
|
||||||
|
Just as a reminder, I'm designing my virtual machine to be very high-level. Even though
|
||||||
|
it's a register VM, it contains opcodes for many typical data structures such as lists,
|
||||||
|
arrays, strings, slices and others. Ideally, generating bytecode for such machine should
|
||||||
|
be trivial in any language.
|
||||||
|
|
||||||
|
Having all data structures defined on the level of the virtual machine and assembly is
|
||||||
|
important, because the VM will make heavy use of frozen data structures. Frozen data
|
||||||
|
structures can be loaded from disk or sent through network, and be accessed without
|
||||||
|
deserialization. To support frozen data structures, every piece of the VM needs to understand
|
||||||
|
their memory layout - starting from the bytecode evaluator and up to the garbage collector
|
||||||
|
and FFI. And since there is a frozen couterpart for every standard data type, it means that
|
||||||
|
all such basic data types would be "fixed" in the implementation.
|
||||||
|
|
||||||
|
Now I'm working on an implementation of a hash table and a hash set, which should conclude
|
||||||
|
the list of basic data structures. They turned out to be a little bit trickier than the rest.
|
||||||
|
|
||||||
|
One of the properties I would like to have in addition to freezing is total order. Which means
|
||||||
|
that every frozen data structure including a hash table should be comparable for equality/inequality
|
||||||
|
and greater/less. This means that hash tables must have predictable layout. This is not true for
|
||||||
|
many implementations of hash tables in existing programming languages.
|
||||||
|
|
||||||
|
The reason for that is how they deal with handling collisions. If one uses open addressing, or
|
||||||
|
just use lists to store elements that have the same key, the order in which you insert items
|
||||||
|
would matter: keys will be in different order when you compare two hash tables.
|
||||||
|
|
||||||
|
One of the solutions to this is how C++ handles the `map` class: it's implemented as a red-black
|
||||||
|
tree which has predictable rebalancing rules. Independently of which order you use to insert
|
||||||
|
elements, you'd always end up with the same layout and order of keys.
|
||||||
|
|
||||||
|
I'm gradually leaning towards either a hash-trie or a list-based hash table where the colliding
|
||||||
|
leafs would be sorted. This would be a little suboptimal speed-wise, but should be fine for now.
|
||||||
|
|
||||||
|
Anyhow, while I'm working on the hash table which I can't yet demonstrate, here are two new opcodes.
|
||||||
|
|
||||||
|
## `hr` - opcode for calculating a hash value
|
||||||
|
|
||||||
|
This opcode has two parameters that are passed as registers. The first is where the result will
|
||||||
|
be stored, and the second contains an object for which hash value would be computed. It can be
|
||||||
|
an arbitrarily complex nested data structure without loops.
|
||||||
|
|
||||||
|
## `jafi` - calls a variadic procedure with specific number of parameters
|
||||||
|
|
||||||
|
The first register passed to this opcode is the address of the procedure/closure, and the immediate
|
||||||
|
parameter is the count of arguments. Currently you can pass up to 12 arguments in the registers
|
||||||
|
as per convention. But as soon as I extend this, you'd be able to pass the rest through the stack.
|
||||||
|
Usually, the functions won't have that many arguments, so mostly you'd just have the register parameters.
|
||||||
|
|
||||||
|
## Example usage
|
||||||
|
|
||||||
|
This is a simple example that demonstrates the new opcodes:
|
||||||
|
|
||||||
|
```
|
||||||
|
;; Loads a constant string to register r1
|
||||||
|
loadc r1, "foobar"
|
||||||
|
|
||||||
|
;; Calculates a hash of the string and
|
||||||
|
;; stores it in r0
|
||||||
|
hr r0, r1
|
||||||
|
|
||||||
|
;; Loads the address of the "print"
|
||||||
|
;; function in register r5
|
||||||
|
lfi r5, print
|
||||||
|
|
||||||
|
;; Calls function that is stored in r5
|
||||||
|
;; with 1 argument (r0)
|
||||||
|
jafi r5, 1
|
||||||
|
|
||||||
|
;; End the execution by returning
|
||||||
|
;; from the top level
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
As usual, all assembler instructions are turned into 32-bit opcodes and serialized to
|
||||||
|
a frozen uint32 array. This array would then get loaded by the virtual machine and
|
||||||
|
executed.
|
Loading…
Reference in a new issue