Add a post about syntax objects

2024-09-08 03:51:07 +01:00 · 2024-09-08 03:51:07 +01:00 · 346db7677f
commit 346db7677f
parent e10d4e0566
1 changed files with 75 additions and 0 deletions
--- a/content/posts/error_reporting_and_syntax_objects/note.md
+++ b/content/posts/error_reporting_and_syntax_objects/note.md
@ -0,0 +1,75 @@
+X-Date: 2024-09-08T01:43:33Z
+X-Note-Id: cf3ed5fd-cbf3-41cc-8c75-3b8b1220fb34
+Subject: Error reporting and syntax objects
+X-Slug: error_reporting_and_syntax_objects
+
+If you know a little bit about lisp, you may think that it is "homoiconic". The code
+that it compiles is written the same way as regular data. For example:
+
+```
+valeri> (+ 1 2 3 (* 4 5))
+26
+```
+
+This is of course a program, but you can also quote it to get back a list:
+
+```
+valeri> '(+ 1 2 3 (* 4 5))
+(+ 1 2 3 (* 4 5))
+```
+
+And many people will either know or realize that it opens up a possibility for source code
+transformation, and in particular macros. I've even heard from some that in lisp, you write
+code directly in AST. But this is actually wrong!
+
+Consider for a moment what will happen if during an arithmetic operation you'll get a runtime
+error? How would the runtime show you the source code location of the error? To do that, the
+compiler must emit debug information with source code mapping. And if the data structures
+that the compiler is receiving as input are just regular lists - the source mapping is lost.
+
+So, practical lisp implementations (at least of Scheme) actually do have AST, which is called
+"syntax objects". See [Racket docs](https://docs.racket-lang.org/guide/stx-obj.html) for an
+in-depth explanation.
+
+In Scheme, syntax objects can wrap any other object and give it additional context such as
+lexical scope, source code location, or any other custom metadata. You can "pack" and "unpack"
+syntax objects if you want to really fiddle with a low-level representation. Scheme also uses
+syntax objects for hygienic macro system, but that's out of scope for me right now.
+
+Since I want [Valeri](https://git.sr.ht/~knazarov/valeri) to be friendly, I've taken a stab
+at implementing syntax objects. To play with them in the REPL, you can do as follows:
+
+```
+valeri> (syntax 42)
+#<syntax 42>
+
+valeri> (syntax (1 2 3))
+#<syntax (#<syntax 1> #<syntax 2> #<syntax 3>)>
+
+valeri> (syntax {1 2 3 4})
+#<syntax (dict #<syntax 1> #<syntax 2> #<syntax 3> #<syntax 4>)>
+```
+
+Here, `syntax` is a special form that allows you to keep the syntax information of its parameter.
+Compare `(syntax (1 2 3))` in the example with the following:
+
+```
+valeri> (quote (1 2 3))
+(1 2 3)
+```
+
+Quote actually does the reverse: it will strip the syntax information from its parameter, so the user
+will see what they expect. Any time any "atoms" (numbers, strings, symbols, etc...) get compiled into
+the bytecode, their syntax information is stripped.
+
+In the current implementation, the reader that parses source code into the object hierarchy is already
+embedding source code information. The compiler or runtime don't utilize this information yet to
+enrich error messages, but that's coming up soon.
+
+And finally, because I've added the collection of syntax context to the reader, it now will show
+errors that happen on the reader phase, like this:
+
+```
+valeri> (1 2 "foo)
+#<error:syntax-error "<unknown>:1:6 Syntax error: unterminated string">
+```