Add a post about garbage collection

This commit is contained in:
Konstantin Nazarov 2023-08-15 21:41:39 +01:00
parent 15644f98ac
commit 29fb6aa1eb
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,69 @@
X-Date: 2023-08-15T21:30:00Z
X-Note-Id: 81666e7b-ed56-400a-b08d-5edad9f966f5
Subject: VM progress update: simple garbage collection
X-Slug: vm_progress_update_simple_garbage_collection
Today I've reached another milestone with the virtual machine.
It can now allocate memory and perform garbage collection.
It is interesting that I don't yet have a high-level language implemented,
but the garbage collection is still useful. In bare-metal assembly, you have
to track memory allocations and deallocations yourself, since the memory is unstructured,
and a potential garbage collector won't be able to differentiate between pointers and
integers. In my virtual machine (and its assembly language), all memory is tagged and
structured, and thus if you have a pointer to the virtual machine data structure, the
garbage collector is able to walk it. So even the assembly/bytecode will soon be able to
work as if memory is not a concern.
For the implementation of garbage collection, I've chosen a modified
[Cheney's algorithm](https://en.wikipedia.org/wiki/Cheney%27s_algorithm).
This algorithm contains two distinct "heaps", one of which is currently in use for allocation.
If it runs out, the algorithm would walk all objects that are reachable from the registers and
the stack of the virtual machine, and move them to the other heap. The previous heap would then
contain only dead objects, and can be grown/shrunk for the next cycle if needed.
For any practical purposes, Cheney's algorithm would be quite slow, and many real programming
languages contain multiple heaps (python has 3) and vary the size of the heaps to reclaim
short-lived objects faster. However, I predict that the naive algorithm would get me pretty far along
the path to the implementation of the actual language, so I don't care that much about it now.
What follows below is a piece of code that demonstrates the C interface for interacting with
VM memory allocation and GC capabilities. It is taken almost verbatim from the unittest and
annotated.
```
# Allocate all structures required by the virtual machine
vm_t* vm = new_vm();
# Allocate an array of 10 elements
tagged_value_t arr = vm_mk_array(vm, 10);
array_set(arr, 0, mk_i64(42));
# Save pointer to the array in a register so it survives garbage collection
vm->registers[R1] = arr;
assert(arena_generation(&vm->arena, arr) == 0,
"Array should be in first generation\n");
# Perform garbage collection
vm_gc(vm);
# The array has moved, so retreive a pointer to it from the register
arr = vm->registers[R1];
# Check that the array contains correct value at 0-th position
tagged_value_t val = array_get(arr, 0);
assert(op_eq(val, mk_i64(42)), "0-th element of an array == %ld\n",
val.value.i64);
# Make sure that the array has really moved between generations
assert(arena_generation(&vm->arena, arr) == 1,
"Array should be in second generation\n");
```
The API is in place, so the next step would be to add a few instructions to the bytecode
that can call memory allocations, and write an inefficient merge sort (the one that would
copy sub-ranges).
As usual, the code can be found [here](https://git.sr.ht/~knazarov/lisp.experimental)
(note that it's experimental, so don't expect any clarity).