Publish a post about delimited continuations and IO

This commit is contained in:
Konstantin Nazarov 2024-09-28 22:00:43 +01:00
parent 43b0f83d7a
commit 45d49de037
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,160 @@
X-Date: 2024-09-28T19:33:25Z
X-Note-Id: 74a3ed04-391c-40a3-b9e8-12386790ba8c
Subject: Purity, delimited continuations and I/O
X-Slug: purity_delimited_continuations_and_io
In this post I'll explain how I deal with I/O and side effects in [Valeri](https://git.knazarov.com/knazarov/valeri),
which is a [pure](https://en.wikipedia.org/wiki/Purely_functional_programming) language.
To explain it in simple terms, let's write a program that counts from 1 to 10:
```
(fn count (x n)
(when (<= x n)
(println x)
(count (+ x 1) n)
)
)
(count 1 10)
```
This program recursively calls the `count` function until it hits the depth limit of 10. On every call, it will
execute `println` which outputs the current depth to the console. This is a simple and obvious program, that looks
quite natural. But how can it output anything to the screen if Valeri is pure, and I/O is a side effect?
The answer is [delimited continuations](https://en.wikipedia.org/wiki/Delimited_continuation). Unfortunately, the
linked Wikipedia article is a bit hard to understand because it uses the shift/reset approach, so I'll try to explain
it in simpler terms.
## Python implementation
Since you likely can at least read Python, let's rewrite the same program in Python and gradually modify it to be pure.
```
def count(x, n):
if x <= n:
print(x)
count(x+1, n)
```
This still uses `print`, but it's fixable if we use generators:
```
def count(x, n):
if (x <= n):
yield x
yield from count(x+1, n)
computation = count(1, 10)
for io in computation:
print(io)
```
As you can see, `print` now is only present at the top level. It means that with this simple trick, we've
turned `count` into a pure function that suspends its execution every time it needs to do IO and the
control is passed to the place where the top-level `count` was called. The loop will then return the control
back to where it was suspended. What is stored in the `computation` variable is called a "continuation",
because you can resume its execution.
However, it's not so fun to be able to only print integers. Let's modify the code to also read the number
we want to count to. For this to work, we need to distinguish between the types of IO operations:
```
# Creates a specific IO operation and yields it, to
# be catched by the outside IO loop
def IO(io_type, *params):
res = yield {"io_type": io_type, "params": params}
return res
def count(x, n=None):
if n is None:
# If "n" is not specified, read if from stdin
n = yield from IO("read")
n = int(n)
if (x <= n):
yield from IO("print", x)
yield from count(x+1, n)
computation = count(1)
while True:
try:
io = next(computation)
except StopIteration:
break
# Pick a specific IO action to execute
if io["io_type"] == "print":
print(io["params"][0])
elif io["io_type"] == "read":
computation.send(input())
```
This example is a bit more complex, but hopefully still understandable. We use a less-well-known `send`
operation that allows to return a value back to the generator. When you do so, `yield` will have a return
value which you can assign to a variable.
If you run the program, it will ask you for a value up to which it needs to count, and then print numbers
between 1 and that value. If we imagine that the IO loop was part of the language runtime, and not part
of our code, you'd have what is called an "effect" system. This effect system is quite powerful: you can
for example record all IO that is happening during the execution and then replay it back to reproduce bugs.
Or you can decide to batch certain IO operations (like "print").
Of course, with Python the pinch of salt would be its non-determinism and state mutability. There are many
things that in practice can go differently between separate executions of the same algorithm if you're
not careful.
## Valeri implementation
Valeri now provides roughly the same IO service out of the box. It works in a similar manner to Python:
- Whenever an IO operations like `println` is called, a special "task" object is created
- Current execution is captured to a continuation, together with the task
- The continuation object is thrown up the stack until it meets the first handler
- If the handler is the top-level IO service, it takes the task and executes it
- It then resumes the continuation by calling the continuation object with the IO result
So, if we try the following in the REPL:
```
valeri> (println "foobar")
foobar
```
It is roughly equivalent to:
```
valeri> (raise (task 'print "foobar\n"))
foobar
```
Here, the `raise` function is a universal way to "capture" the delimited continuation and unwind the stack.
In fact, raising errors works the same way, just the parameter to `raise` has a different type:
```
valeri> (fn myfun () (raise (error "some-error")))
#<function myfun>
valeri> (myfun)
Uncaught error: #<continuation>
Function error
Function myfun
```
The difference with Python is mostly in the convenience of not having to type `yield from` for intermediate
functions. This makes the code completely transparent and at the first glance work as if it has side effects.
By using delimited continuations, you can implement many standard primitives like:
- Coroutines
- Exceptions
- Global state
- Green threads
- Lazy sequences
- ... and much more
The current implementation in Valeri is not yet capable of doing this, but mainly because there's currently
no way to "catch" the continuation in the user's code. Stay tuned for the updates.