Add a post about strings and slices

This commit is contained in:
Konstantin Nazarov 2023-09-02 23:57:43 +01:00
parent c4d2281704
commit 2756c48cfa
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,40 @@
X-Date: 2023-09-02T23:50:00Z
X-Note-Id: b8bf5799-4608-4762-925e-8de0b759a970
Subject: VM progress update: strings, slices and function bindings
X-Slug: vm_progress_update_strings_slices_calls
Last few days I've been working on getting the initial string support landed in the VM implementation.
As soon as I finish this, it would be possible to write programs in assembly that can show something
to the user.
String implementation actually consists of two things:
- strings themselves (objects on the heap that contain a character array inside and its size)
- string slices (that can reference parts of the string without creating a copy)
String slices are convenient because in theory they can be small enough to be put into registers
or stored on the stack. This means that code that walks the strings (parsing, splitting, etc) won't
put a lot of pressure on the garbage collector. And since strings are immutable, it should always
be safe to keep the slice around.
From the garbage collector's point of view, slices point to the beginning of the string, and contain
a range. This allows to hold the original string in memory if you have a slice pointing to it.
In theory, string and array slices should work the same way. I don't yet have array slice support, but
in the end it would likely be just the same VM opcode for both.
The latest patches also removed a custom implementation of slices from the assembler code, and it's now
based on the same functionality that the VM uses. It complicated the assembler code a little bit, as
I have to carry around VM data structures in order to do memory allocations. This is because memory
arenas aren't flexible enough to do allocation of arbitrary sizes. The arena has a fixed limit, and as
soon as you hit it, a garbage collection is triggered. In the VM bytecode that doesn't pose a problem
since registers and stack are serving as GC roots. In the assembler which is written in C, the GC
roots are spread around the code and it's not easy to wrap them.
Fixing the GC issues is a matter of a separate implementation, where I would borrow a few ideas
on how memory arenas work in Zig. It would allow me to safely work with the VM memory from C code,
and only do GC once the execution fully leaves the C procedure. This mostly means implementing a linked-list
of memory pages (pretty much what malloc does).
All in all, a few more steps and I would be able to implement a `print` function and be able to
write a "game of life" in the VM assembly. Can't wait for that to start working.