Add a post about hashes and function calls

2023-10-16 00:41:52 +01:00 · 2023-10-16 00:41:52 +01:00 · 61d149acd9
commit 61d149acd9
parent d43af27eb2
1 changed files with 83 additions and 0 deletions
--- a/content/posts/vm_progress_update_hashes_and_function_calls/note.md
+++ b/content/posts/vm_progress_update_hashes_and_function_calls/note.md
@ -0,0 +1,83 @@
 X-Date: 2023-10-16T00:50:00Z
 X-Note-Id: 9558177b-58e2-442f-ad96-a0c6b5a70407
 Subject: VM progress update: hashes and function calls
 X-Slug: vm_progress_update_hashes_and_function_calls
 Today I'd like to tell you about 2 new opcodes I've added to the virtual machine.
 The first is for calculating hashes, and the second is for calling variadic functions.
 Just as a reminder, I'm designing my virtual machine to be very high-level. Even though
 it's a register VM, it contains opcodes for many typical data structures such as lists,
 arrays, strings, slices and others. Ideally, generating bytecode for such machine should
 be trivial in any language.
 Having all data structures defined on the level of the virtual machine and assembly is
 important, because the VM will make heavy use of frozen data structures. Frozen data
 structures can be loaded from disk or sent through network, and be accessed without
 deserialization. To support frozen data structures, every piece of the VM needs to understand
 their memory layout - starting from the bytecode evaluator and up to the garbage collector
 and FFI. And since there is a frozen couterpart for every standard data type, it means that
 all such basic data types would be "fixed" in the implementation.
 Now I'm working on an implementation of a hash table and a hash set, which should conclude
 the list of basic data structures. They turned out to be a little bit trickier than the rest.
 One of the properties I would like to have in addition to freezing is total order. Which means
 that every frozen data structure including a hash table should be comparable for equality/inequality
 and greater/less. This means that hash tables must have predictable layout. This is not true for
 many implementations of hash tables in existing programming languages.
 The reason for that is how they deal with handling collisions. If one uses open addressing, or
 just use lists to store elements that have the same key, the order in which you insert items
 would matter: keys will be in different order when you compare two hash tables.
 One of the solutions to this is how C++ handles the `map` class: it's implemented as a red-black
 tree which has predictable rebalancing rules. Independently of which order you use to insert
 elements, you'd always end up with the same layout and order of keys.
 I'm gradually leaning towards either a hash-trie or a list-based hash table where the colliding
 leafs would be sorted. This would be a little suboptimal speed-wise, but should be fine for now.
 Anyhow, while I'm working on the hash table which I can't yet demonstrate, here are two new opcodes.
 ## `hr` - opcode for calculating a hash value
 This opcode has two parameters that are passed as registers. The first is where the result will
 be stored, and the second contains an object for which hash value would be computed. It can be
 an arbitrarily complex nested data structure without loops.
 ## `jafi` - calls a variadic procedure with specific number of parameters
 The first register passed to this opcode is the address of the procedure/closure, and the immediate
 parameter is the count of arguments. Currently you can pass up to 12 arguments in the registers
 as per convention. But as soon as I extend this, you'd be able to pass the rest through the stack.
 Usually, the functions won't have that many arguments, so mostly you'd just have the register parameters.
 ## Example usage
 This is a simple example that demonstrates the new opcodes:
 ```
 ;; Loads a constant string to register r1
 loadc r1, "foobar"
 ;; Calculates a hash of the string and
 ;; stores it in r0
 hr r0, r1
 ;; Loads the address of the "print"
 ;; function in register r5
 lfi r5, print
 ;; Calls function that is stored in r5
 ;; with 1 argument (r0)
 jafi r5, 1
 ;; End the execution by returning
 ;; from the top level
 ret
 ```
 As usual, all assembler instructions are turned into 32-bit opcodes and serialized to
 a frozen uint32 array. This array would then get loaded by the virtual machine and
 executed.