From 61d149acd9450c849591f87bfebf69f971f7e992 Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Mon, 16 Oct 2023 00:41:52 +0100 Subject: [PATCH] Add a post about hashes and function calls --- .../note.md | 83 +++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 content/posts/vm_progress_update_hashes_and_function_calls/note.md diff --git a/content/posts/vm_progress_update_hashes_and_function_calls/note.md b/content/posts/vm_progress_update_hashes_and_function_calls/note.md new file mode 100644 index 0000000..645fcd4 --- /dev/null +++ b/content/posts/vm_progress_update_hashes_and_function_calls/note.md @@ -0,0 +1,83 @@ +X-Date: 2023-10-16T00:50:00Z +X-Note-Id: 9558177b-58e2-442f-ad96-a0c6b5a70407 +Subject: VM progress update: hashes and function calls +X-Slug: vm_progress_update_hashes_and_function_calls + +Today I'd like to tell you about 2 new opcodes I've added to the virtual machine. +The first is for calculating hashes, and the second is for calling variadic functions. + +Just as a reminder, I'm designing my virtual machine to be very high-level. Even though +it's a register VM, it contains opcodes for many typical data structures such as lists, +arrays, strings, slices and others. Ideally, generating bytecode for such machine should +be trivial in any language. + +Having all data structures defined on the level of the virtual machine and assembly is +important, because the VM will make heavy use of frozen data structures. Frozen data +structures can be loaded from disk or sent through network, and be accessed without +deserialization. To support frozen data structures, every piece of the VM needs to understand +their memory layout - starting from the bytecode evaluator and up to the garbage collector +and FFI. And since there is a frozen couterpart for every standard data type, it means that +all such basic data types would be "fixed" in the implementation. + +Now I'm working on an implementation of a hash table and a hash set, which should conclude +the list of basic data structures. They turned out to be a little bit trickier than the rest. + +One of the properties I would like to have in addition to freezing is total order. Which means +that every frozen data structure including a hash table should be comparable for equality/inequality +and greater/less. This means that hash tables must have predictable layout. This is not true for +many implementations of hash tables in existing programming languages. + +The reason for that is how they deal with handling collisions. If one uses open addressing, or +just use lists to store elements that have the same key, the order in which you insert items +would matter: keys will be in different order when you compare two hash tables. + +One of the solutions to this is how C++ handles the `map` class: it's implemented as a red-black +tree which has predictable rebalancing rules. Independently of which order you use to insert +elements, you'd always end up with the same layout and order of keys. + +I'm gradually leaning towards either a hash-trie or a list-based hash table where the colliding +leafs would be sorted. This would be a little suboptimal speed-wise, but should be fine for now. + +Anyhow, while I'm working on the hash table which I can't yet demonstrate, here are two new opcodes. + +## `hr` - opcode for calculating a hash value + +This opcode has two parameters that are passed as registers. The first is where the result will +be stored, and the second contains an object for which hash value would be computed. It can be +an arbitrarily complex nested data structure without loops. + +## `jafi` - calls a variadic procedure with specific number of parameters + +The first register passed to this opcode is the address of the procedure/closure, and the immediate +parameter is the count of arguments. Currently you can pass up to 12 arguments in the registers +as per convention. But as soon as I extend this, you'd be able to pass the rest through the stack. +Usually, the functions won't have that many arguments, so mostly you'd just have the register parameters. + +## Example usage + +This is a simple example that demonstrates the new opcodes: + +``` +;; Loads a constant string to register r1 +loadc r1, "foobar" + +;; Calculates a hash of the string and +;; stores it in r0 +hr r0, r1 + +;; Loads the address of the "print" +;; function in register r5 +lfi r5, print + +;; Calls function that is stored in r5 +;; with 1 argument (r0) +jafi r5, 1 + +;; End the execution by returning +;; from the top level +ret +``` + +As usual, all assembler instructions are turned into 32-bit opcodes and serialized to +a frozen uint32 array. This array would then get loaded by the virtual machine and +executed.