2023-08-15 20:41:39 +00:00
|
|
|
X-Date: 2023-08-15T21:30:00Z
|
|
|
|
X-Note-Id: 81666e7b-ed56-400a-b08d-5edad9f966f5
|
|
|
|
Subject: VM progress update: simple garbage collection
|
|
|
|
X-Slug: vm_progress_update_simple_garbage_collection
|
|
|
|
|
|
|
|
Today I've reached another milestone with the virtual machine.
|
|
|
|
It can now allocate memory and perform garbage collection.
|
|
|
|
|
|
|
|
It is interesting that I don't yet have a high-level language implemented,
|
|
|
|
but the garbage collection is still useful. In bare-metal assembly, you have
|
|
|
|
to track memory allocations and deallocations yourself, since the memory is unstructured,
|
|
|
|
and a potential garbage collector won't be able to differentiate between pointers and
|
|
|
|
integers. In my virtual machine (and its assembly language), all memory is tagged and
|
|
|
|
structured, and thus if you have a pointer to the virtual machine data structure, the
|
|
|
|
garbage collector is able to walk it. So even the assembly/bytecode will soon be able to
|
|
|
|
work as if memory is not a concern.
|
|
|
|
|
|
|
|
For the implementation of garbage collection, I've chosen a modified
|
|
|
|
[Cheney's algorithm](https://en.wikipedia.org/wiki/Cheney%27s_algorithm).
|
|
|
|
This algorithm contains two distinct "heaps", one of which is currently in use for allocation.
|
|
|
|
If it runs out, the algorithm would walk all objects that are reachable from the registers and
|
|
|
|
the stack of the virtual machine, and move them to the other heap. The previous heap would then
|
|
|
|
contain only dead objects, and can be grown/shrunk for the next cycle if needed.
|
|
|
|
|
|
|
|
For any practical purposes, Cheney's algorithm would be quite slow, and many real programming
|
|
|
|
languages contain multiple heaps (python has 3) and vary the size of the heaps to reclaim
|
|
|
|
short-lived objects faster. However, I predict that the naive algorithm would get me pretty far along
|
|
|
|
the path to the implementation of the actual language, so I don't care that much about it now.
|
|
|
|
|
|
|
|
What follows below is a piece of code that demonstrates the C interface for interacting with
|
|
|
|
VM memory allocation and GC capabilities. It is taken almost verbatim from the unittest and
|
|
|
|
annotated.
|
|
|
|
|
|
|
|
```
|
|
|
|
# Allocate all structures required by the virtual machine
|
|
|
|
vm_t* vm = new_vm();
|
|
|
|
|
|
|
|
# Allocate an array of 10 elements
|
|
|
|
tagged_value_t arr = vm_mk_array(vm, 10);
|
|
|
|
array_set(arr, 0, mk_i64(42));
|
|
|
|
|
|
|
|
# Save pointer to the array in a register so it survives garbage collection
|
|
|
|
vm->registers[R1] = arr;
|
|
|
|
|
|
|
|
assert(arena_generation(&vm->arena, arr) == 0,
|
|
|
|
"Array should be in first generation\n");
|
|
|
|
|
|
|
|
# Perform garbage collection
|
|
|
|
vm_gc(vm);
|
|
|
|
|
|
|
|
# The array has moved, so retreive a pointer to it from the register
|
|
|
|
arr = vm->registers[R1];
|
|
|
|
|
|
|
|
# Check that the array contains correct value at 0-th position
|
|
|
|
tagged_value_t val = array_get(arr, 0);
|
|
|
|
assert(op_eq(val, mk_i64(42)), "0-th element of an array == %ld\n",
|
|
|
|
val.value.i64);
|
|
|
|
|
|
|
|
# Make sure that the array has really moved between generations
|
|
|
|
assert(arena_generation(&vm->arena, arr) == 1,
|
|
|
|
"Array should be in second generation\n");
|
|
|
|
```
|
|
|
|
|
|
|
|
The API is in place, so the next step would be to add a few instructions to the bytecode
|
|
|
|
that can call memory allocations, and write an inefficient merge sort (the one that would
|
|
|
|
copy sub-ranges).
|
|
|
|
|
2024-09-11 23:12:15 +00:00
|
|
|
As usual, the code can be found [here](https://git.knazarov.com/knazarov/valeri)
|
2023-08-15 20:41:39 +00:00
|
|
|
(note that it's experimental, so don't expect any clarity).
|