From 0043a54b30e4696445ea2587c19bf36441edf81a Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sun, 17 Sep 2023 20:02:25 +0100 Subject: [PATCH] Add a post about arena allocator --- .../note.md | 71 +++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 content/posts/vm_progress_update_arena_asan_gcov/note.md diff --git a/content/posts/vm_progress_update_arena_asan_gcov/note.md b/content/posts/vm_progress_update_arena_asan_gcov/note.md new file mode 100644 index 0000000..2d4ac18 --- /dev/null +++ b/content/posts/vm_progress_update_arena_asan_gcov/note.md @@ -0,0 +1,71 @@ +X-Date: 2023-09-17T21:00:00Z +X-Note-Id: 4c9287dc-53f7-4a1d-8bd8-9edbff143200 +Subject: VM progress update: flexible arena, ASAN and gcov +X-Slug: vm_progress_update_arena_asan_gcov + +This is another post in the series where I talk about what's new in the [virtual machine](https://git.sr.ht/~knazarov/lisp.experimental) +I'm working on. + +Today let's discuss memory allocation. In a typical garbage-collected language, the memory that is allocated for the language objects +is taken from a "pool". Usually, the pool is a contiguous region that is pre-allocated using `malloc()` and then while there is still +space in that pool, the specialized allocator logic would place objects there. As soon as the pool is completely filled, or when +a certain allocation threshold is reached, a garbage collector will be called. The collector will then traverse known "roots" and +everything not reachable through those roots will be discarded. + +In high-performance languages with a garbage collector, there would be of course multiple layers of optimizations. In my case, +performance is not an explicit goal (at least not yet). So I initially went with two pools, where one is used for allocations, +and another is always vacant. As soon as one pool is fully occupied, a garbage collector would copy all reachable objects to +the second one, and switch that pool to be the primary one. This approach worked well initially, until I hit the C interop problems. + +And this is where things become interesting. When I have the virtual machine context around (essentially in the VM implementation code), +I can pass it to the memory allocation functions like this: + +``` +vm_t* vm = ...; + +// allocate an array of 10 elements +tagged_value_t obj = + vm_mk_array(vm, 10); + +// do something with obj +``` + +`vm_mk_array()` would call `vm_alloc(vm, size)` internally to allocate a raw chunk of memory. If during that call the pool has +insufficient memory, the `vm_alloc()` function can trigger garbage collection itself and then resume allocation so the caller +would not be aware of all the complications. + +In this case, you need to be very careful with handling the results of functions that perform memory allocations. +If you won't save the pointer to the memory region in the VM register or stack, or otherwise mark it as the GC root, +the subsequent garbage collection won't treat this object as alive and would just "erase" it. + +When I started rewriting the assembly compiler to use "native objects" and allocate them in the VM memory pool, I immediately +hit this problem. Writing code with the expectation that every dynamically allocated object can be pulled from under your feet +if you're not careful makes the code complicated and hard to read. + +And then I remembered how [Zig](https://ziglang.org/) solves these problems. In Zig, there's a thing called "arena allocator", +that allows you to not care about freeing the individual objects, and free all allocated memory at once when you're done with +computations. This is implemented through a linked list of buffers that the allocator maintains internally. When buffers run out +of space, a new buffer is allocated and added to the list. It allows the arena to grow dynamically, and all allocated objects +"stay put". + +So to solve the problem with the C code, I ended up using the idea of the arena allocators from Zig. Instead of memory pools +having fixed size, I turned them into a linked list of pages internally. This means that individual memory allocations would +never call the garbage collector as the memory is always available (provided there is enough memory on the physical machine). +I moved all garbage collection to the upper level, so that it is only called when the virtual machine bytecode is evaluated. +This means that any C code that is called from the VM (or that calls back to it) can just perform memory allocations safely +from the VM pool, and then expect this memory to be freed by the VM later. But still expect that the garbage collection would +not be triggered while the C function is executing. + +This has made implementation of the memory allocation and garbage collection a little bit more complex, and I started having +segfaults and leaks (which can be expected in low-level code like this). To make debugging easier, I enabled `-fsanitize=address` +compiler flag, which is essentially using [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) to wrap all +memory allocations and instruments code to detect incorrect accesses to memory. It allowed to very quickly iron out most of the +trivial allocation bugs. + +In addition to enabling the address sanitizer, I started gathering test coverage with +[gcov](https://gcc.gnu.org/onlinedocs/gcc/Gcov.html), which is now part of the GCC toolchain. It allows me to see which parts +of the critical functionality are not covered with tests, and so needs more work. I even added a +[plugin](https://github.com/AdamNiederer/cov) to my editor that annotates the opened .c or .h files with colored markers for +lines that don't have test coverage. + +I find that if you have clangd, ASAN, gcov, gdb and some test coverage, working on the low-level C code can actually be pretty enjoyable!