diff --git a/content/posts/capturing_and_serializing_continuations/note.md b/content/posts/capturing_and_serializing_continuations/note.md new file mode 100644 index 0000000..679a092 --- /dev/null +++ b/content/posts/capturing_and_serializing_continuations/note.md @@ -0,0 +1,120 @@ +X-Date: 2024-10-05T22:14:26Z +X-Note-Id: 235642a1-b67f-4836-a6ae-1b9272e5275b +Subject: Capturing and serializing continuations +X-Slug: capturing_and_serializing_continuations + +With latest patches to [Valeri](https://git.knazarov.com/knazarov/valeri) it became possible to +do some magic that you just can't do in Python. At least not without creating lots of tricky +edge cases. Because of Valeri's immutability, objects in memory always form a unidirectional +graph. The same is true for virtual machine runtime. This means that it is possible to take +current state of computation, serialize it to a contiguous byte array, send over the network +and then resume the computation on the other end. + +Because of the [referential transparency](https://en.wikipedia.org/wiki/Referential_transparency), +objects in memory don't have any "identity" other than their contents. So if they are equal, +you can substitute one with the other and nothing will change. Let's look at the example, and +then I'll break it down: + +``` +(def c (guard (fn body () (+ 1 (raise :unused))) + (fn handler (err cont) cont))) + +;; "serialized" will hold the detached representation +;; of the continuation "c" +(def serialized (serialize c)) +(println "serialized: " serialized "\n") + +(def deserialized (deserialize serialized)) +(println "deserialized: " deserialized "\n") + +;; Call the deserialized continuation +(println "result: " (deserialized 2)) +``` + +First, let's start with `guard`. It does pretty much the same thing as exception handlers do +in other languages. Its prototype is `(guard )`. It calls the +`` and if an exception occurs during the execution, it calls the ``. +Exception handler takes two arguments: first is the exception object itself, and the second is +the "continuation" object. The only difference with typical programming languages is that by +calling the "continuation" object, you will jump back to where the exception has occured, and +the `(raise)` function will return the value passed to the continuation as a parameter. + +In this example, we suspend execution right at the time the second argument to addition is +computed. If the continuation is called with an integer parameter - it will be used in this +addition. So effectively, you can think that the continuation contains a function roughly +equivalent to `(fn (x) (+ 1 x))`. For simplicity, we have a very simple function here. But +it can in fact be arbitrarily nested and use recursion. In such case the whole call stack +starting from the "guard" will be preserved. + +Because in the handler function we immediately return the continuation object - it will become +the result of the whole guard statement and then assigned to variable `c`. We can then repeatedly +call `c` with different arguments like `(c 5)` to get `6`. The number of times we call it doesn't +matter: every time execution will be "forked off" from the point where the original computation +has been suspended. + +Next in the example we have this: + +``` +(def serialized (serialize c)) +(println "serialized: " serialized "\n") +``` + +Here, `(serialize c)` takes the computation state graph in memory and flattens it into the byte array +(possible due to the graph being unidirectional). When executed, it will give you something like this: + +``` +serialized: # +;; Some of the output was trimmed to save screen space +``` + +This flattened representation is computed by getting hold of `c` as a "root" and then traversing the +graph, copying and compacting every object traversed into the contiguous area in memory. Because +memory addressing in Valeri is offset-based, there is no separate format for the serialized data. +Data structures in memory are exactly the same as in the serialized form. + +Then we deserialize the byte array back: + +``` +(def deserialized (deserialize serialized)) +(println "deserialized: " deserialized "\n") +``` + +When executed, this code gives us: + +``` +deserialized: # +``` + +It's important to reiterate that there is no separate format for serialized data. Because of this, +"deserialization" just involves copying the content of the byte array into the heap, and reinterpreting +the beginning of that data as the beginning of the resulting data structure. Of course I'll do some +validation in the future so that you can't corrupt the runtime by handcrafting the byte array, but the +general approach will stay the same. + +And finally, we execute the deserialized continuation: + +``` +(println "result: " (deserialized 2)) +``` + +Which gives us: + +``` +result: 3 +``` + +And of course we can easily run it again: + +``` +(println "result: " (deserialized 42)) +result: 43 +``` + +This execution model is very flexible. By using it, you can save the full "image" of the running program, +or only part of it. Or send computation over the network to be performed on the "other side", where that +"other side" can be reasonably certain that the computation won't be able to do anything it's not supposed +to do. + +Another interesting thing you can do with this is "execution snapshotting", where you can save state of +your program at a certain point in time, and then revert back to it if you are debugging or otherwise +reproducing unintended behavior.