Add a post about syntax objects
This commit is contained in:
parent
e10d4e0566
commit
346db7677f
1 changed files with 75 additions and 0 deletions
75
content/posts/error_reporting_and_syntax_objects/note.md
Normal file
75
content/posts/error_reporting_and_syntax_objects/note.md
Normal file
|
@ -0,0 +1,75 @@
|
|||
X-Date: 2024-09-08T01:43:33Z
|
||||
X-Note-Id: cf3ed5fd-cbf3-41cc-8c75-3b8b1220fb34
|
||||
Subject: Error reporting and syntax objects
|
||||
X-Slug: error_reporting_and_syntax_objects
|
||||
|
||||
If you know a little bit about lisp, you may think that it is "homoiconic". The code
|
||||
that it compiles is written the same way as regular data. For example:
|
||||
|
||||
```
|
||||
valeri> (+ 1 2 3 (* 4 5))
|
||||
26
|
||||
```
|
||||
|
||||
This is of course a program, but you can also quote it to get back a list:
|
||||
|
||||
```
|
||||
valeri> '(+ 1 2 3 (* 4 5))
|
||||
(+ 1 2 3 (* 4 5))
|
||||
```
|
||||
|
||||
And many people will either know or realize that it opens up a possibility for source code
|
||||
transformation, and in particular macros. I've even heard from some that in lisp, you write
|
||||
code directly in AST. But this is actually wrong!
|
||||
|
||||
Consider for a moment what will happen if during an arithmetic operation you'll get a runtime
|
||||
error? How would the runtime show you the source code location of the error? To do that, the
|
||||
compiler must emit debug information with source code mapping. And if the data structures
|
||||
that the compiler is receiving as input are just regular lists - the source mapping is lost.
|
||||
|
||||
So, practical lisp implementations (at least of Scheme) actually do have AST, which is called
|
||||
"syntax objects". See [Racket docs](https://docs.racket-lang.org/guide/stx-obj.html) for an
|
||||
in-depth explanation.
|
||||
|
||||
In Scheme, syntax objects can wrap any other object and give it additional context such as
|
||||
lexical scope, source code location, or any other custom metadata. You can "pack" and "unpack"
|
||||
syntax objects if you want to really fiddle with a low-level representation. Scheme also uses
|
||||
syntax objects for hygienic macro system, but that's out of scope for me right now.
|
||||
|
||||
Since I want [Valeri](https://git.sr.ht/~knazarov/valeri) to be friendly, I've taken a stab
|
||||
at implementing syntax objects. To play with them in the REPL, you can do as follows:
|
||||
|
||||
```
|
||||
valeri> (syntax 42)
|
||||
#<syntax 42>
|
||||
|
||||
valeri> (syntax (1 2 3))
|
||||
#<syntax (#<syntax 1> #<syntax 2> #<syntax 3>)>
|
||||
|
||||
valeri> (syntax {1 2 3 4})
|
||||
#<syntax (dict #<syntax 1> #<syntax 2> #<syntax 3> #<syntax 4>)>
|
||||
```
|
||||
|
||||
Here, `syntax` is a special form that allows you to keep the syntax information of its parameter.
|
||||
Compare `(syntax (1 2 3))` in the example with the following:
|
||||
|
||||
```
|
||||
valeri> (quote (1 2 3))
|
||||
(1 2 3)
|
||||
```
|
||||
|
||||
Quote actually does the reverse: it will strip the syntax information from its parameter, so the user
|
||||
will see what they expect. Any time any "atoms" (numbers, strings, symbols, etc...) get compiled into
|
||||
the bytecode, their syntax information is stripped.
|
||||
|
||||
In the current implementation, the reader that parses source code into the object hierarchy is already
|
||||
embedding source code information. The compiler or runtime don't utilize this information yet to
|
||||
enrich error messages, but that's coming up soon.
|
||||
|
||||
And finally, because I've added the collection of syntax context to the reader, it now will show
|
||||
errors that happen on the reader phase, like this:
|
||||
|
||||
```
|
||||
valeri> (1 2 "foo)
|
||||
#<error:syntax-error "<unknown>:1:6 Syntax error: unterminated string">
|
||||
```
|
Loading…
Reference in a new issue