From 6a77c697f19c679f5b43bd0b9fc084eb981c1244 Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sat, 3 Aug 2024 03:21:23 +0100 Subject: [PATCH] Publish a post on new garbage collector --- .../new_spin_on_cpp_garbage_collector/note.md | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 content/posts/new_spin_on_cpp_garbage_collector/note.md diff --git a/content/posts/new_spin_on_cpp_garbage_collector/note.md b/content/posts/new_spin_on_cpp_garbage_collector/note.md new file mode 100644 index 0000000..a5b023a --- /dev/null +++ b/content/posts/new_spin_on_cpp_garbage_collector/note.md @@ -0,0 +1,60 @@ +X-Date: 2024-08-03T01:59:20Z +X-Note-Id: 926d77c4-2332-4b0a-a9de-d0215facb90c +Subject: New spin on a C++ garbage collector +X-Slug: new_spin_on_cpp_garbage_collector + +While doing another round of implementing my programming language in C++, I've reached +a stage where garbage collection works and is integrated with the "host language". +An example is worth a thousand words, so here you go: + +``` +StaticArena<64 * 1024 * 1024> arena; + +TEST_CASE(dict_insert) { + auto d = DIEX(Dict::create(arena)); + + d = DIEX(d.insert(arena, 1, 2)); + d = DIEX(d.insert(arena, 1, 3)); + d = DIEX(d.insert(arena, 3, 3)); + d = DIEX(d.insert(arena, 0, 4)); + d = DIEX(d.insert(arena, 0, 5)); + d = DIEX(d.insert(arena, 2, 6)); + + DIEX(arena.gc()); + + auto s = DIEX(write_one(arena, d)); + + DIEX(arena.gc()); + + ASSERT_EQUALS(s, "{0 5 1 3 2 6 3 3}"); +} +``` + +Here, `Dict::create()` creates a dictionary and places its data into a garbage-collected arena. +The variable `d` that is placed on the C++ stack is just a garbage collector "root". When the +object is moved by the garbage collector between different arena heaps, the roots get re-pointed +to the new locations, so it is safe to operate with such variables and place them to other C++ +objects. + +I then call `d.insert()` to insert key-value pairs to the dictionary. Note that this function +returns a new object that I assign back to `d`. This is because in my language objects are actually +immutable, and `d.insert()` creates a new dictionary instead of changing the old one. + +The call to `arena.gc()` performs a full garbage collection cycle and switches the primary and +secondary arena heaps, achieving data compaction and removal of objects that are no longer +accessible (here - previous versions of dictionary `d`). + +Then I convert dictionary into a string with `write_one()`, which is a way to get textual +representation of any hierarchy of objects. + +Note that `arena.gc()` is not the only way to trigger garbage-collection. Any memory allocation that +overflows the active heap will trigger a GC cycle, and it means that you can't normally hold any +direct pointers to such heap and need to always use gc roots to refer to objects. + +Most importantly, the example doesn't use any additional dynamically allocated memory apart from +a contiguous region occupied by the `arena`. This is a property that I would like to keep, as +it would make embedding into other programs easier: you'd always keep data of the host program +and the extension language separate. + +The end result looks pretty satisfactory to me to a degree where I think I can start prototyping +a bytecode compiler.