From 2756c48cfab0f3453035f740038b9b04d140c0cb Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sat, 2 Sep 2023 23:57:43 +0100 Subject: [PATCH] Add a post about strings and slices --- .../note.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 content/posts/vm_progress_update_strings_slices_calls/note.md diff --git a/content/posts/vm_progress_update_strings_slices_calls/note.md b/content/posts/vm_progress_update_strings_slices_calls/note.md new file mode 100644 index 0000000..12e84d8 --- /dev/null +++ b/content/posts/vm_progress_update_strings_slices_calls/note.md @@ -0,0 +1,40 @@ +X-Date: 2023-09-02T23:50:00Z +X-Note-Id: b8bf5799-4608-4762-925e-8de0b759a970 +Subject: VM progress update: strings, slices and function bindings +X-Slug: vm_progress_update_strings_slices_calls + +Last few days I've been working on getting the initial string support landed in the VM implementation. +As soon as I finish this, it would be possible to write programs in assembly that can show something +to the user. + +String implementation actually consists of two things: + +- strings themselves (objects on the heap that contain a character array inside and its size) +- string slices (that can reference parts of the string without creating a copy) + +String slices are convenient because in theory they can be small enough to be put into registers +or stored on the stack. This means that code that walks the strings (parsing, splitting, etc) won't +put a lot of pressure on the garbage collector. And since strings are immutable, it should always +be safe to keep the slice around. + +From the garbage collector's point of view, slices point to the beginning of the string, and contain +a range. This allows to hold the original string in memory if you have a slice pointing to it. + +In theory, string and array slices should work the same way. I don't yet have array slice support, but +in the end it would likely be just the same VM opcode for both. + +The latest patches also removed a custom implementation of slices from the assembler code, and it's now +based on the same functionality that the VM uses. It complicated the assembler code a little bit, as +I have to carry around VM data structures in order to do memory allocations. This is because memory +arenas aren't flexible enough to do allocation of arbitrary sizes. The arena has a fixed limit, and as +soon as you hit it, a garbage collection is triggered. In the VM bytecode that doesn't pose a problem +since registers and stack are serving as GC roots. In the assembler which is written in C, the GC +roots are spread around the code and it's not easy to wrap them. + +Fixing the GC issues is a matter of a separate implementation, where I would borrow a few ideas +on how memory arenas work in Zig. It would allow me to safely work with the VM memory from C code, +and only do GC once the execution fully leaves the C procedure. This mostly means implementing a linked-list +of memory pages (pretty much what malloc does). + +All in all, a few more steps and I would be able to implement a `print` function and be able to +write a "game of life" in the VM assembly. Can't wait for that to start working.