Add a post about syntax objects

This commit is contained in:
Konstantin Nazarov 2024-09-08 03:51:07 +01:00
parent e10d4e0566
commit 346db7677f
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,75 @@
X-Date: 2024-09-08T01:43:33Z
X-Note-Id: cf3ed5fd-cbf3-41cc-8c75-3b8b1220fb34
Subject: Error reporting and syntax objects
X-Slug: error_reporting_and_syntax_objects
If you know a little bit about lisp, you may think that it is "homoiconic". The code
that it compiles is written the same way as regular data. For example:
```
valeri> (+ 1 2 3 (* 4 5))
26
```
This is of course a program, but you can also quote it to get back a list:
```
valeri> '(+ 1 2 3 (* 4 5))
(+ 1 2 3 (* 4 5))
```
And many people will either know or realize that it opens up a possibility for source code
transformation, and in particular macros. I've even heard from some that in lisp, you write
code directly in AST. But this is actually wrong!
Consider for a moment what will happen if during an arithmetic operation you'll get a runtime
error? How would the runtime show you the source code location of the error? To do that, the
compiler must emit debug information with source code mapping. And if the data structures
that the compiler is receiving as input are just regular lists - the source mapping is lost.
So, practical lisp implementations (at least of Scheme) actually do have AST, which is called
"syntax objects". See [Racket docs](https://docs.racket-lang.org/guide/stx-obj.html) for an
in-depth explanation.
In Scheme, syntax objects can wrap any other object and give it additional context such as
lexical scope, source code location, or any other custom metadata. You can "pack" and "unpack"
syntax objects if you want to really fiddle with a low-level representation. Scheme also uses
syntax objects for hygienic macro system, but that's out of scope for me right now.
Since I want [Valeri](https://git.sr.ht/~knazarov/valeri) to be friendly, I've taken a stab
at implementing syntax objects. To play with them in the REPL, you can do as follows:
```
valeri> (syntax 42)
#<syntax 42>
valeri> (syntax (1 2 3))
#<syntax (#<syntax 1> #<syntax 2> #<syntax 3>)>
valeri> (syntax {1 2 3 4})
#<syntax (dict #<syntax 1> #<syntax 2> #<syntax 3> #<syntax 4>)>
```
Here, `syntax` is a special form that allows you to keep the syntax information of its parameter.
Compare `(syntax (1 2 3))` in the example with the following:
```
valeri> (quote (1 2 3))
(1 2 3)
```
Quote actually does the reverse: it will strip the syntax information from its parameter, so the user
will see what they expect. Any time any "atoms" (numbers, strings, symbols, etc...) get compiled into
the bytecode, their syntax information is stripped.
In the current implementation, the reader that parses source code into the object hierarchy is already
embedding source code information. The compiler or runtime don't utilize this information yet to
enrich error messages, but that's coming up soon.
And finally, because I've added the collection of syntax context to the reader, it now will show
errors that happen on the reader phase, like this:
```
valeri> (1 2 "foo)
#<error:syntax-error "<unknown>:1:6 Syntax error: unterminated string">
```