Add an article about the C++ rewrite
This commit is contained in:
parent
c698499c51
commit
a265fb6141
1 changed files with 125 additions and 0 deletions
125
content/posts/cpp_rewrite_of_my_language/note.md
Normal file
125
content/posts/cpp_rewrite_of_my_language/note.md
Normal file
|
@ -0,0 +1,125 @@
|
||||||
|
X-Date: 2024-04-01T22:00:00Z
|
||||||
|
X-Note-Id: 72f1a46c-d19a-413d-84a8-46be1cfa575d
|
||||||
|
Subject: I've ported my language from C to C++ (a story of error handling)
|
||||||
|
X-Slug: cpp_rewrite_of_my_language
|
||||||
|
|
||||||
|
I've been writing my programming language in pure C for quite some time, but recently
|
||||||
|
I decided to port it to C++. The key problem that made me do so is error handling.
|
||||||
|
While I was working on the bytecode virtual machine, it was all relatively simple. The
|
||||||
|
virtual machine is just a large switch over the opcodes with relatively trivial
|
||||||
|
functions for basic arithmetic operations, jumps and conditions.
|
||||||
|
|
||||||
|
As I started to work on the parser and runtime data structures, the code quickly became
|
||||||
|
hard to reason about. This is in part because I decided to gracefully handle memory allocation
|
||||||
|
errors. To understand the issue, let's consider a simple function, `assoc_get`, which takes
|
||||||
|
an indexable object and returns a value at index:
|
||||||
|
|
||||||
|
```
|
||||||
|
Value obj = mk_array(10);
|
||||||
|
Value index = mk_i64(5);
|
||||||
|
Value val = mk_i64(42);
|
||||||
|
|
||||||
|
// Writes "42" at array index 5
|
||||||
|
assoc_set(obj, index, val)
|
||||||
|
|
||||||
|
Value res = assoc_get(obj, index);
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, there are 2 possible error cases here:
|
||||||
|
|
||||||
|
- The index can be out of range
|
||||||
|
- We couldn't allocate memory for a temporary value on the garbage-collected heap
|
||||||
|
|
||||||
|
In both of these cases, what should be the value of `res` and how would we know that an error
|
||||||
|
has occured? One of the options to deal with this is setting an `errno` and returning some sort of
|
||||||
|
"placeholder" that doesn't mean anything (e.g. `nil`). Another is using "out parameters" like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Value res = mk_nil();
|
||||||
|
ErrorCode rc = assoc_get(&res, obj, index);
|
||||||
|
if (rc) {
|
||||||
|
// clean up and return
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
There are also more obscure ways that some of the interpreters utilize, like doing `setjmp()` somewhere at
|
||||||
|
the entry point of the virtual machine loop, and then `longjmp()` if there's an error down the line.
|
||||||
|
This works in some cases, but it easily leads to resource leaks.
|
||||||
|
|
||||||
|
What would be really awesome is if C had some sort of sum types, or ability to return two values from
|
||||||
|
a function - a result and an error (pretty much like Zig or Go both do).
|
||||||
|
|
||||||
|
Initially I tried to bolt on the sum types by introducing separate structs like:
|
||||||
|
|
||||||
|
```
|
||||||
|
struct ValueOrError {
|
||||||
|
Value result;
|
||||||
|
ErrorCode error;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Following this approach, I've refactored the code so that all functions that can return an error would
|
||||||
|
return such sum type. Like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
ValueOrError res = assoc_get(obj, index);
|
||||||
|
if (res.error) {
|
||||||
|
// clean up and return
|
||||||
|
}
|
||||||
|
|
||||||
|
// do something with res.result
|
||||||
|
```
|
||||||
|
|
||||||
|
This worked, but it required too much ceremony and cluttered the code. Now for every separate type that would
|
||||||
|
be returned from a function, I had to create a "wrapper" type that essentially implements a respective result type.
|
||||||
|
|
||||||
|
Eventually it led to a state where working on the codebase was no longer fun. Instead of implementing the logic,
|
||||||
|
I had to be very verbose all the time. The worst of all is that refactoring the codebase became too taxing. Since
|
||||||
|
error handling code needed to know the underlying structure of objects, every time I changes interfaces, things
|
||||||
|
started to break in too many places at once (and often in runtime).
|
||||||
|
|
||||||
|
So finally, I gave up and decided to use C++ where you can implement a `Result` sum type. My reasoning was that I
|
||||||
|
can still go pretty minimal and disable exceptions, RTTI, and probably even at some point get rid of the standard
|
||||||
|
library. But what I would get in return is a sane and clean error handling.
|
||||||
|
|
||||||
|
Imagine something line this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Result<Value> sum(Value array) {
|
||||||
|
size_t size = TRY(assoc_size(array));
|
||||||
|
int res = 0;
|
||||||
|
for (size_t i = 0; i < size; ++) {
|
||||||
|
Value val = TRY(assoc_get(array, mk_i64(i)));
|
||||||
|
res += TRY(val.get_i64());
|
||||||
|
}
|
||||||
|
|
||||||
|
return mk_i64(res);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The interesting part here is the `TRY` macro. It would automatically try to unpack the `Result` object. If
|
||||||
|
it contains an error - it would return the error from the current function. If not - the result of the expression
|
||||||
|
would be the unpacked content of the `Result`.
|
||||||
|
|
||||||
|
The implementation of the `TRY` macro is pretty straightforward:
|
||||||
|
|
||||||
|
```
|
||||||
|
#define TRY(m) \
|
||||||
|
(({ \
|
||||||
|
auto ___res = (m); \
|
||||||
|
if (!___res.has_value()) return ___res.error(); \
|
||||||
|
std::move(___res); \
|
||||||
|
}).release_value())
|
||||||
|
```
|
||||||
|
|
||||||
|
The most interesting part here is `({ ... })`. This is a so-called "compound statement expression". It's a
|
||||||
|
GCC and clang extension, that allows you to have one expression that consists of multiple operations. The
|
||||||
|
value of the last one is what would be treated as a result of the expression. This is what allows you to
|
||||||
|
call `return` from within the expression, which is otherwise not possible (since `return` is a statement).
|
||||||
|
|
||||||
|
If you use this macro, the code becomes easy to read. You immediately see which functions can fail, and
|
||||||
|
can bubble up errors concisely to the place that knows how to deal with them. It is almost as easy to use
|
||||||
|
as exceptions, with the added benefit of being explicit.
|
||||||
|
|
||||||
|
The reason I want to avoid exceptions is mainly because I would like to make my language embeddable, and
|
||||||
|
exceptions don't play really well when you mix them with different language runtimes.
|
Loading…
Reference in a new issue