Publish a note about tree structured stack

This commit is contained in:
Konstantin Nazarov 2024-09-13 03:38:56 +01:00
parent 83b7cc0385
commit ecbbdcede2
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,41 @@
X-Date: 2024-09-13T01:44:20Z
X-Note-Id: 54bf0663-7e0c-462f-831b-e27fabb8f943
Subject: Immutable tree-structured stack
X-Slug: immutable_tree_structured_stack
I'm writing [valeri](https://git.knazarov.com/knazarov/valeri) to be a pure language. As a pure
language, it needs a way to somehow do IO in a way that doesn't violate purity and doesn't lead
to mutation of state or non-reproducible behavior.
One of the ways to achieve this is called "algebraic effects". You can read theory on
algebraic effects in the [Eff paper](https://www.eff-lang.org/handlers-tutorial.pdf). In short,
IO can be represented as a "continuation", where an attempt to call a function doing IO leads
to saving the full state of computation and "throwing" it up the stack until it reaches the
top level. The VM then performs the requested IO and calls the continuation object with the
results. It looks like sort of a resumable try-catch.
The reason why systems with "algebraic effects" can be considered pure is because in a sense
you invert the control. As if the program is divided into 2 pieces - the one before the IO and the
one after the IO. Both of those parts are themselves pure function and you just call them
sequentially.
There are multiple clever ways to implement continuations, but most of them reduce program
debuggability in the same way that automatic tail-call optimizations do. Mostly because
they don't preserve the call stack except for what is necessary for continuing the execution
past a certain point. An attempt to print a stacktrace on exception will likely lead to
nonsensical results. (or even exploring values of certain local variables in the debugger can be
impossible)
So instead, I went for the most straightforward approach to saving program state. Initially I
started out with having a single mutable stack backed by an array. As many regular languages
do, it used a base pointer to delimit the current "frame". And now I've converted one single
stack to a linked list of frames: whenever a function is called, a new frame is allocated
on the heap, and it points back to the previous frame. So in theory you can return back to
the previous frame and start a new "branch" from there.
The only trick here is to prevent modification of the shared parts of the stack when we throw
the continuation and a function somewhere up the stack performs some computation. To achieve
this, my stack data structure is immutable: all modifications of the frame actually create
a new copy of the same frame, and the VM switches to that. It's not efficient, but it's a
way to get a working solution. And there are ways to remove unneeded copies where we can
perform direct mutation of the frame if there's nobody else who refers to the same frame.