From 0043a54b30e4696445ea2587c19bf36441edf81a Mon Sep 17 00:00:00 2001
From: Konstantin Nazarov <mail@knazarov.com>
Date: Sun, 17 Sep 2023 20:02:25 +0100
Subject: [PATCH] Add a post about arena allocator

---
 .../note.md                                   | 71 +++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 content/posts/vm_progress_update_arena_asan_gcov/note.md

diff --git a/content/posts/vm_progress_update_arena_asan_gcov/note.md b/content/posts/vm_progress_update_arena_asan_gcov/note.md
new file mode 100644
index 0000000..2d4ac18
--- /dev/null
+++ b/content/posts/vm_progress_update_arena_asan_gcov/note.md
@@ -0,0 +1,71 @@
+X-Date: 2023-09-17T21:00:00Z
+X-Note-Id: 4c9287dc-53f7-4a1d-8bd8-9edbff143200
+Subject: VM progress update: flexible arena, ASAN and gcov
+X-Slug: vm_progress_update_arena_asan_gcov
+
+This is another post in the series where I talk about what's new in the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental)
+I'm working on.
+
+Today let's discuss memory allocation. In a typical garbage-collected language, the memory that is allocated for the language objects
+is taken from a "pool". Usually, the pool is a contiguous region that is pre-allocated using `malloc()` and then while there is still
+space in that pool, the specialized allocator logic would place objects there. As soon as the pool is completely filled, or when
+a certain allocation threshold is reached, a garbage collector will be called. The collector will then traverse known "roots" and
+everything not reachable through those roots will be discarded.
+
+In high-performance languages with a garbage collector, there would be of course multiple layers of optimizations. In my case,
+performance is not an explicit goal (at least not yet). So I initially went with two pools, where one is used for allocations,
+and another is always vacant. As soon as one pool is fully occupied, a garbage collector would copy all reachable objects to
+the second one, and switch that pool to be the primary one. This approach worked well initially, until I hit the C interop problems.
+
+And this is where things become interesting. When I have the virtual machine context around (essentially in the VM implementation code),
+I can pass it to the memory allocation functions like this:
+
+```
+vm_t* vm = ...;
+
+// allocate an array of 10 elements
+tagged_value_t obj =
+  vm_mk_array(vm, 10);
+
+// do something with obj
+```
+
+`vm_mk_array()` would call `vm_alloc(vm, size)` internally to allocate a raw chunk of memory. If during that call the pool has
+insufficient memory, the `vm_alloc()` function can trigger garbage collection itself and then resume allocation so the caller
+would not be aware of all the complications.
+
+In this case, you need to be very careful with handling the results of functions that perform memory allocations.
+If you won't save the pointer to the memory region in the VM register or stack, or otherwise mark it as the GC root,
+the subsequent garbage collection won't treat this object as alive and would just "erase" it.
+
+When I started rewriting the assembly compiler to use "native objects" and allocate them in the VM memory pool, I immediately
+hit this problem. Writing code with the expectation that every dynamically allocated object can be pulled from under your feet
+if you're not careful makes the code complicated and hard to read.
+
+And then I remembered how [Zig](https://ziglang.org/) solves these problems. In Zig, there's a thing called "arena allocator",
+that allows you to not care about freeing the individual objects, and free all allocated memory at once when you're done with
+computations. This is implemented through a linked list of buffers that the allocator maintains internally. When buffers run out
+of space, a new buffer is allocated and added to the list. It allows the arena to grow dynamically, and all allocated objects
+"stay put".
+
+So to solve the problem with the C code, I ended up using the idea of the arena allocators from Zig. Instead of memory pools
+having fixed size, I turned them into a linked list of pages internally. This means that individual memory allocations would
+never call the garbage collector as the memory is always available (provided there is enough memory on the physical machine).
+I moved all garbage collection to the upper level, so that it is only called when the virtual machine bytecode is evaluated.
+This means that any C code that is called from the VM (or that calls back to it) can just perform memory allocations safely
+from the VM pool, and then expect this memory to be freed by the VM later. But still expect that the garbage collection would
+not be triggered while the C function is executing.
+
+This has made implementation of the memory allocation and garbage collection a little bit more complex, and I started having
+segfaults and leaks (which can be expected in low-level code like this). To make debugging easier, I enabled `-fsanitize=address`
+compiler flag, which is essentially using [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) to wrap all
+memory allocations and instruments code to detect incorrect accesses to memory. It allowed to very quickly iron out most of the
+trivial allocation bugs.
+
+In addition to enabling the address sanitizer, I started gathering test coverage with
+[gcov](https://gcc.gnu.org/onlinedocs/gcc/Gcov.html), which is now part of the GCC toolchain. It allows me to see which parts
+of the critical functionality are not covered with tests, and so needs more work. I even added a
+[plugin](https://github.com/AdamNiederer/cov) to my editor that annotates the opened .c or .h files with colored markers for
+lines that don't have test coverage.
+
+I find that if you have clangd, ASAN, gcov, gdb and some test coverage, working on the low-level C code can actually be pretty enjoyable!