knazarov.com/content/posts/purity_delimited_continuations_and_io/note.md

5.5 KiB

X-Date: 2024-09-28T19:33:25Z X-Note-Id: 74a3ed04-391c-40a3-b9e8-12386790ba8c Subject: Purity, delimited continuations and I/O X-Slug: purity_delimited_continuations_and_io

In this post I'll explain how I deal with I/O and side effects in Valeri, which is a pure language.

To explain it in simple terms, let's write a program that counts from 1 to 10:

(fn count (x n)
    (when (<= x n)
      (println x)
      (count (+ x 1) n)
      )
    )

(count 1 10)

This program recursively calls the count function until it hits the depth limit of 10. On every call, it will execute println which outputs the current depth to the console. This is a simple and obvious program, that looks quite natural. But how can it output anything to the screen if Valeri is pure, and I/O is a side effect?

The answer is delimited continuations. Unfortunately, the linked Wikipedia article is a bit hard to understand because it uses the shift/reset approach, so I'll try to explain it in simpler terms.

Python implementation

Since you likely can at least read Python, let's rewrite the same program in Python and gradually modify it to be pure.

def count(x, n):
    if x <= n:
        print(x)
        count(x+1, n)

This still uses print, but it's fixable if we use generators:

def count(x, n):
    if (x <= n):
        yield x
        yield from count(x+1, n)

computation = count(1, 10)

for io in computation:
    print(io)

As you can see, print now is only present at the top level. It means that with this simple trick, we've turned count into a pure function that suspends its execution every time it needs to do IO and the control is passed to the place where the top-level count was called. The loop will then return the control back to where it was suspended. What is stored in the computation variable is called a "continuation", because you can resume its execution.

However, it's not so fun to be able to only print integers. Let's modify the code to also read the number we want to count to. For this to work, we need to distinguish between the types of IO operations:

# Creates a specific IO operation and yields it, to
# be catched by the outside IO loop
def IO(io_type, *params):
    res = yield {"io_type": io_type, "params": params}
    return res

def count(x, n=None):
    if n is None:
        # If "n" is not specified, read if from stdin
        n = yield from IO("read")
        n = int(n)

    if (x <= n):
        yield from IO("print", x)
        yield from count(x+1, n)

computation = count(1)

while True:
    try:
        io = next(computation)
    except StopIteration:
        break

    # Pick a specific IO action to execute
    if io["io_type"] == "print":
        print(io["params"][0])
    elif io["io_type"] == "read":
        computation.send(input())

This example is a bit more complex, but hopefully still understandable. We use a less-well-known send operation that allows to return a value back to the generator. When you do so, yield will have a return value which you can assign to a variable.

If you run the program, it will ask you for a value up to which it needs to count, and then print numbers between 1 and that value. If we imagine that the IO loop was part of the language runtime, and not part of our code, you'd have what is called an "effect" system. This effect system is quite powerful: you can for example record all IO that is happening during the execution and then replay it back to reproduce bugs. Or you can decide to batch certain IO operations (like "print").

Of course, with Python the pinch of salt would be its non-determinism and state mutability. There are many things that in practice can go differently between separate executions of the same algorithm if you're not careful.

Valeri implementation

Valeri now provides roughly the same IO service out of the box. It works in a similar manner to Python:

  • Whenever an IO operations like println is called, a special "task" object is created
  • Current execution is captured to a continuation, together with the task
  • The continuation object is thrown up the stack until it meets the first handler
  • If the handler is the top-level IO service, it takes the task and executes it
  • It then resumes the continuation by calling the continuation object with the IO result

So, if we try the following in the REPL:

valeri> (println "foobar")
foobar

It is roughly equivalent to:

valeri> (raise (task 'print "foobar\n"))
foobar

Here, the raise function is a universal way to "capture" the delimited continuation and unwind the stack. In fact, raising errors works the same way, just the parameter to raise has a different type:

valeri> (fn myfun () (raise (error "some-error")))
#<function myfun>

valeri> (myfun)
Uncaught error: #<continuation>
  Function error
  Function myfun

The difference with Python is mostly in the convenience of not having to type yield from for intermediate functions. This makes the code completely transparent and at the first glance work as if it has side effects.

By using delimited continuations, you can implement many standard primitives like:

  • Coroutines
  • Exceptions
  • Global state
  • Green threads
  • Lazy sequences
  • ... and much more

The current implementation in Valeri is not yet capable of doing this, but mainly because there's currently no way to "catch" the continuation in the user's code. Stay tuned for the updates.