From 29fb6aa1eb0f9551d1cbadd6cc8a9ae50b4480d3 Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Tue, 15 Aug 2023 21:41:39 +0100 Subject: [PATCH] Add a post about garbage collection --- .../note.md | 69 +++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 content/posts/vm_progress_update_simple_garbage_collection/note.md diff --git a/content/posts/vm_progress_update_simple_garbage_collection/note.md b/content/posts/vm_progress_update_simple_garbage_collection/note.md new file mode 100644 index 0000000..62d034e --- /dev/null +++ b/content/posts/vm_progress_update_simple_garbage_collection/note.md @@ -0,0 +1,69 @@ +X-Date: 2023-08-15T21:30:00Z +X-Note-Id: 81666e7b-ed56-400a-b08d-5edad9f966f5 +Subject: VM progress update: simple garbage collection +X-Slug: vm_progress_update_simple_garbage_collection + +Today I've reached another milestone with the virtual machine. +It can now allocate memory and perform garbage collection. + +It is interesting that I don't yet have a high-level language implemented, +but the garbage collection is still useful. In bare-metal assembly, you have +to track memory allocations and deallocations yourself, since the memory is unstructured, +and a potential garbage collector won't be able to differentiate between pointers and +integers. In my virtual machine (and its assembly language), all memory is tagged and +structured, and thus if you have a pointer to the virtual machine data structure, the +garbage collector is able to walk it. So even the assembly/bytecode will soon be able to +work as if memory is not a concern. + +For the implementation of garbage collection, I've chosen a modified +[Cheney's algorithm](https://en.wikipedia.org/wiki/Cheney%27s_algorithm). +This algorithm contains two distinct "heaps", one of which is currently in use for allocation. +If it runs out, the algorithm would walk all objects that are reachable from the registers and +the stack of the virtual machine, and move them to the other heap. The previous heap would then +contain only dead objects, and can be grown/shrunk for the next cycle if needed. + +For any practical purposes, Cheney's algorithm would be quite slow, and many real programming +languages contain multiple heaps (python has 3) and vary the size of the heaps to reclaim +short-lived objects faster. However, I predict that the naive algorithm would get me pretty far along +the path to the implementation of the actual language, so I don't care that much about it now. + +What follows below is a piece of code that demonstrates the C interface for interacting with +VM memory allocation and GC capabilities. It is taken almost verbatim from the unittest and +annotated. + +``` +# Allocate all structures required by the virtual machine +vm_t* vm = new_vm(); + +# Allocate an array of 10 elements +tagged_value_t arr = vm_mk_array(vm, 10); +array_set(arr, 0, mk_i64(42)); + +# Save pointer to the array in a register so it survives garbage collection +vm->registers[R1] = arr; + +assert(arena_generation(&vm->arena, arr) == 0, + "Array should be in first generation\n"); + +# Perform garbage collection +vm_gc(vm); + +# The array has moved, so retreive a pointer to it from the register +arr = vm->registers[R1]; + +# Check that the array contains correct value at 0-th position +tagged_value_t val = array_get(arr, 0); +assert(op_eq(val, mk_i64(42)), "0-th element of an array == %ld\n", + val.value.i64); + +# Make sure that the array has really moved between generations +assert(arena_generation(&vm->arena, arr) == 1, + "Array should be in second generation\n"); +``` + +The API is in place, so the next step would be to add a few instructions to the bytecode +that can call memory allocations, and write an inefficient merge sort (the one that would +copy sub-ranges). + +As usual, the code can be found [here](https://git.sr.ht/~knazarov/lisp.experimental) +(note that it's experimental, so don't expect any clarity).