Publish a post on new garbage collector

This commit is contained in:
Konstantin Nazarov 2024-08-03 03:21:23 +01:00
parent 8e08e2ec07
commit 6a77c697f1
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,60 @@
X-Date: 2024-08-03T01:59:20Z
X-Note-Id: 926d77c4-2332-4b0a-a9de-d0215facb90c
Subject: New spin on a C++ garbage collector
X-Slug: new_spin_on_cpp_garbage_collector
While doing another round of implementing my programming language in C++, I've reached
a stage where garbage collection works and is integrated with the "host language".
An example is worth a thousand words, so here you go:
```
StaticArena<64 * 1024 * 1024> arena;
TEST_CASE(dict_insert) {
auto d = DIEX(Dict::create(arena));
d = DIEX(d.insert(arena, 1, 2));
d = DIEX(d.insert(arena, 1, 3));
d = DIEX(d.insert(arena, 3, 3));
d = DIEX(d.insert(arena, 0, 4));
d = DIEX(d.insert(arena, 0, 5));
d = DIEX(d.insert(arena, 2, 6));
DIEX(arena.gc());
auto s = DIEX(write_one(arena, d));
DIEX(arena.gc());
ASSERT_EQUALS(s, "{0 5 1 3 2 6 3 3}");
}
```
Here, `Dict::create()` creates a dictionary and places its data into a garbage-collected arena.
The variable `d` that is placed on the C++ stack is just a garbage collector "root". When the
object is moved by the garbage collector between different arena heaps, the roots get re-pointed
to the new locations, so it is safe to operate with such variables and place them to other C++
objects.
I then call `d.insert()` to insert key-value pairs to the dictionary. Note that this function
returns a new object that I assign back to `d`. This is because in my language objects are actually
immutable, and `d.insert()` creates a new dictionary instead of changing the old one.
The call to `arena.gc()` performs a full garbage collection cycle and switches the primary and
secondary arena heaps, achieving data compaction and removal of objects that are no longer
accessible (here - previous versions of dictionary `d`).
Then I convert dictionary into a string with `write_one()`, which is a way to get textual
representation of any hierarchy of objects.
Note that `arena.gc()` is not the only way to trigger garbage-collection. Any memory allocation that
overflows the active heap will trigger a GC cycle, and it means that you can't normally hold any
direct pointers to such heap and need to always use gc roots to refer to objects.
Most importantly, the example doesn't use any additional dynamically allocated memory apart from
a contiguous region occupied by the `arena`. This is a property that I would like to keep, as
it would make embedding into other programs easier: you'd always keep data of the host program
and the extension language separate.
The end result looks pretty satisfactory to me to a degree where I think I can start prototyping
a bytecode compiler.