diff --git a/content/posts/purity_delimited_continuations_and_io/note.md b/content/posts/purity_delimited_continuations_and_io/note.md new file mode 100644 index 0000000..f7a3804 --- /dev/null +++ b/content/posts/purity_delimited_continuations_and_io/note.md @@ -0,0 +1,160 @@ +X-Date: 2024-09-28T19:33:25Z +X-Note-Id: 74a3ed04-391c-40a3-b9e8-12386790ba8c +Subject: Purity, delimited continuations and I/O +X-Slug: purity_delimited_continuations_and_io + +In this post I'll explain how I deal with I/O and side effects in [Valeri](https://git.knazarov.com/knazarov/valeri), +which is a [pure](https://en.wikipedia.org/wiki/Purely_functional_programming) language. + +To explain it in simple terms, let's write a program that counts from 1 to 10: + +``` +(fn count (x n) + (when (<= x n) + (println x) + (count (+ x 1) n) + ) + ) + +(count 1 10) +``` + +This program recursively calls the `count` function until it hits the depth limit of 10. On every call, it will +execute `println` which outputs the current depth to the console. This is a simple and obvious program, that looks +quite natural. But how can it output anything to the screen if Valeri is pure, and I/O is a side effect? + +The answer is [delimited continuations](https://en.wikipedia.org/wiki/Delimited_continuation). Unfortunately, the +linked Wikipedia article is a bit hard to understand because it uses the shift/reset approach, so I'll try to explain +it in simpler terms. + +## Python implementation + +Since you likely can at least read Python, let's rewrite the same program in Python and gradually modify it to be pure. + +``` +def count(x, n): + if x <= n: + print(x) + count(x+1, n) +``` + +This still uses `print`, but it's fixable if we use generators: + +``` +def count(x, n): + if (x <= n): + yield x + yield from count(x+1, n) + +computation = count(1, 10) + +for io in computation: + print(io) +``` + +As you can see, `print` now is only present at the top level. It means that with this simple trick, we've +turned `count` into a pure function that suspends its execution every time it needs to do IO and the +control is passed to the place where the top-level `count` was called. The loop will then return the control +back to where it was suspended. What is stored in the `computation` variable is called a "continuation", +because you can resume its execution. + +However, it's not so fun to be able to only print integers. Let's modify the code to also read the number +we want to count to. For this to work, we need to distinguish between the types of IO operations: + +``` +# Creates a specific IO operation and yields it, to +# be catched by the outside IO loop +def IO(io_type, *params): + res = yield {"io_type": io_type, "params": params} + return res + +def count(x, n=None): + if n is None: + # If "n" is not specified, read if from stdin + n = yield from IO("read") + n = int(n) + + if (x <= n): + yield from IO("print", x) + yield from count(x+1, n) + +computation = count(1) + +while True: + try: + io = next(computation) + except StopIteration: + break + + # Pick a specific IO action to execute + if io["io_type"] == "print": + print(io["params"][0]) + elif io["io_type"] == "read": + computation.send(input()) +``` + +This example is a bit more complex, but hopefully still understandable. We use a less-well-known `send` +operation that allows to return a value back to the generator. When you do so, `yield` will have a return +value which you can assign to a variable. + +If you run the program, it will ask you for a value up to which it needs to count, and then print numbers +between 1 and that value. If we imagine that the IO loop was part of the language runtime, and not part +of our code, you'd have what is called an "effect" system. This effect system is quite powerful: you can +for example record all IO that is happening during the execution and then replay it back to reproduce bugs. +Or you can decide to batch certain IO operations (like "print"). + +Of course, with Python the pinch of salt would be its non-determinism and state mutability. There are many +things that in practice can go differently between separate executions of the same algorithm if you're +not careful. + +## Valeri implementation + +Valeri now provides roughly the same IO service out of the box. It works in a similar manner to Python: + +- Whenever an IO operations like `println` is called, a special "task" object is created +- Current execution is captured to a continuation, together with the task +- The continuation object is thrown up the stack until it meets the first handler +- If the handler is the top-level IO service, it takes the task and executes it +- It then resumes the continuation by calling the continuation object with the IO result + +So, if we try the following in the REPL: + +``` +valeri> (println "foobar") +foobar +``` + +It is roughly equivalent to: + +``` +valeri> (raise (task 'print "foobar\n")) +foobar +``` + +Here, the `raise` function is a universal way to "capture" the delimited continuation and unwind the stack. +In fact, raising errors works the same way, just the parameter to `raise` has a different type: + +``` +valeri> (fn myfun () (raise (error "some-error"))) +# + +valeri> (myfun) +Uncaught error: # + Function error + Function myfun +``` + +The difference with Python is mostly in the convenience of not having to type `yield from` for intermediate +functions. This makes the code completely transparent and at the first glance work as if it has side effects. + +By using delimited continuations, you can implement many standard primitives like: + +- Coroutines +- Exceptions +- Global state +- Green threads +- Lazy sequences +- ... and much more + +The current implementation in Valeri is not yet capable of doing this, but mainly because there's currently +no way to "catch" the continuation in the user's code. Stay tuned for the updates.