43 lines
2.7 KiB
Markdown
43 lines
2.7 KiB
Markdown
|
X-Date: 2023-12-31T09:00:00Z
|
||
|
X-Note-Id: 9ef1ba1c-2fea-4bac-9126-db53b6c127cf
|
||
|
Subject: VM progress update: dicts and parsing
|
||
|
X-Slug: vm_progress_update_dicts_and_parsing
|
||
|
|
||
|
This would be a short update, because I don't have much time at the moment to write.
|
||
|
But still, the virtual machine progresses at a steady pace. I feel that I'm pretty close
|
||
|
to having things in place to write a compiler on top of.
|
||
|
|
||
|
First, I've added support for proper dictionaries. As I wrote in the previous post, the dictionary
|
||
|
implementation is based on the red-black trees, which means that I had to carefully refactor all
|
||
|
existing basic data structures, so that they all have "strict total order" comparison capability.
|
||
|
The dictionaries can be "frozen" as well as other data structures, in which case they turn into a
|
||
|
binary search tree represented as an array (as there's no way to resize a frozen data structure, this
|
||
|
is OK).
|
||
|
|
||
|
Then, arrays no longer have special case implementation for storing integer types. I thought this
|
||
|
would be a good idea, but their implementation details have leaked into other pieces of the codebase,
|
||
|
causing bloat and bugs. I've since refactored the special cases away, and things became much simpler.
|
||
|
It means that an array of 64-bit integers is 2x as large as it may have been, but it's fine right now.
|
||
|
|
||
|
Arrays have received support for in-place resize. You can take a pointer to an array and increase or
|
||
|
decrease its size. It works in a way that doesn't invalidate existing pointers to the array.
|
||
|
|
||
|
Strings are now represented as arrays of 32-bit UTF-8 codepoints. Initially I thought that for the
|
||
|
sake of compactness it's better to just store variable length encoding internally, but the lack of
|
||
|
random access to characters has messed up a few of the things that depended on strings. So I've
|
||
|
rewritten them to have a less compact form, but be more user-friendly.
|
||
|
|
||
|
There is now support for proper boolean values. Initially I thought that having the "truthy" `'t`
|
||
|
symbol and `nil` would do. But then I thought that many external real-world apps do care about
|
||
|
proper boolean types (e.g. anything that consumes JSON). So I've added boolean type as a first-class
|
||
|
citizen.
|
||
|
|
||
|
And finally, there is now a "writer" and a "reader". The purpose of the "writer" is to serialize
|
||
|
data structures into the text form. This can be used to print data structures to the screen, or
|
||
|
debug the raw frozen slices taken from memory. The "reader" does things in reverse - it takes the
|
||
|
textual representation and turns it into the language's data structures.
|
||
|
|
||
|
Because I'm basing the language on S-expressions, it means that the textual representation that the
|
||
|
"reader" consumes is almost the AST (abstract syntax tree) that the compiler can use to produce
|
||
|
the lower-level bytecode.
|