From 346db7677f2b9047d44fc62348eea7287a08e2ad Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sun, 8 Sep 2024 03:51:07 +0100 Subject: [PATCH] Add a post about syntax objects --- .../note.md | 75 +++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 content/posts/error_reporting_and_syntax_objects/note.md diff --git a/content/posts/error_reporting_and_syntax_objects/note.md b/content/posts/error_reporting_and_syntax_objects/note.md new file mode 100644 index 0000000..7e27222 --- /dev/null +++ b/content/posts/error_reporting_and_syntax_objects/note.md @@ -0,0 +1,75 @@ +X-Date: 2024-09-08T01:43:33Z +X-Note-Id: cf3ed5fd-cbf3-41cc-8c75-3b8b1220fb34 +Subject: Error reporting and syntax objects +X-Slug: error_reporting_and_syntax_objects + +If you know a little bit about lisp, you may think that it is "homoiconic". The code +that it compiles is written the same way as regular data. For example: + +``` +valeri> (+ 1 2 3 (* 4 5)) +26 +``` + +This is of course a program, but you can also quote it to get back a list: + +``` +valeri> '(+ 1 2 3 (* 4 5)) +(+ 1 2 3 (* 4 5)) +``` + +And many people will either know or realize that it opens up a possibility for source code +transformation, and in particular macros. I've even heard from some that in lisp, you write +code directly in AST. But this is actually wrong! + +Consider for a moment what will happen if during an arithmetic operation you'll get a runtime +error? How would the runtime show you the source code location of the error? To do that, the +compiler must emit debug information with source code mapping. And if the data structures +that the compiler is receiving as input are just regular lists - the source mapping is lost. + +So, practical lisp implementations (at least of Scheme) actually do have AST, which is called +"syntax objects". See [Racket docs](https://docs.racket-lang.org/guide/stx-obj.html) for an +in-depth explanation. + +In Scheme, syntax objects can wrap any other object and give it additional context such as +lexical scope, source code location, or any other custom metadata. You can "pack" and "unpack" +syntax objects if you want to really fiddle with a low-level representation. Scheme also uses +syntax objects for hygienic macro system, but that's out of scope for me right now. + +Since I want [Valeri](https://git.sr.ht/~knazarov/valeri) to be friendly, I've taken a stab +at implementing syntax objects. To play with them in the REPL, you can do as follows: + +``` +valeri> (syntax 42) +# + +valeri> (syntax (1 2 3)) +# # #)> + +valeri> (syntax {1 2 3 4}) +# # # #)> +``` + +Here, `syntax` is a special form that allows you to keep the syntax information of its parameter. +Compare `(syntax (1 2 3))` in the example with the following: + +``` +valeri> (quote (1 2 3)) +(1 2 3) +``` + +Quote actually does the reverse: it will strip the syntax information from its parameter, so the user +will see what they expect. Any time any "atoms" (numbers, strings, symbols, etc...) get compiled into +the bytecode, their syntax information is stripped. + +In the current implementation, the reader that parses source code into the object hierarchy is already +embedding source code information. The compiler or runtime don't utilize this information yet to +enrich error messages, but that's coming up soon. + +And finally, because I've added the collection of syntax context to the reader, it now will show +errors that happen on the reader phase, like this: + +``` +valeri> (1 2 "foo) +#:1:6 Syntax error: unterminated string"> +```