Publish a post about serialization/deserialization

This commit is contained in:
Konstantin Nazarov 2024-01-10 21:12:59 +00:00
parent 78c9c6a1ba
commit a169ee4c61
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,79 @@
X-Date: 2024-01-07T23:00:00Z
X-Note-Id: ed8dd390-20db-4eb0-95b6-96f495f1d5ed
Subject: Text to binary and back
X-Slug: text_to_binary_and_back
In the last couple of days I've finished an important part of the virtual machine, that
allows to translate objects from the binary representation in memory to a text form, and
back.
For example, take this data structure:
```
;; this is an array
[
1 2 3 ;; numbers
foobar ;; symbol
{"foo" "bar"} ;; dict with str key
]
```
If you save it into data.txt, you can convert it into the binary format:
```
cat data.txt | ./sd > data.bin
```
The `data.bin` file would contain the encoded version, which we can examine with the standard `xxd` tool:
```
cat data.bin | xxd
00000000: 0000 0000 0000 0000 90c8 0000 0000 0000 ................
00000010: 0500 0000 0000 0000 0700 0000 0000 0000 ................
00000020: 0100 0000 0000 0000 0700 0000 0000 0000 ................
00000030: 0200 0000 0000 0000 0700 0000 0000 0000 ................
00000040: 0300 0000 0000 0000 9968 0000 0000 0000 .........h......
00000050: 4000 0000 1061 0000 9588 0000 0000 0000 @....a..........
00000060: 4000 0000 1061 0000 0600 0000 0000 0000 @....a..........
00000070: 6600 0000 6f00 0000 6f00 0000 6200 0000 f...o...o...b...
00000080: 6100 0000 7200 0000 0100 0000 0000 0000 a...r...........
00000090: 91b0 0000 0000 0000 4000 0000 1061 0000 ........@....a..
000000a0: 91c4 0000 0000 0000 4000 0000 1061 0000 ........@....a..
000000b0: 0300 0000 0000 0000 6600 0000 6f00 0000 ........f...o...
000000c0: 6f00 0000 0300 0000 0000 0000 6200 0000 o...........b...
000000d0: 6100 0000 7200 0000 a...r...
```
And you can also decode the binary data back to the text form:
```
cat data.bin | ./sd -d
[1 2 3 foobar {"foo" "bar"}]
```
This would be exactly the same data structure we had initially, just without comments.
You can also do the same trick with bytecode produced by the assembler, since bytecode is also
serialized as a frozen data structure:
```
cat examples/factorial.asm | ./asm | ./sd -d
[[31372u32 3084u32 2956u32 246661u32 50178u32 4294913042u32 779u32 3342u32 7447u32 24u32] ["! is"]]
```
And we can even turn the bytecode back from the text representation to the binary, and run it:
```
cat examples/factorial.asm | ./asm | ./sd -d | ./sd | ./vm
15 ! is 1307674368000
```
This serialization/deserialization mechanism is important, since it serves also as a parser. My programming
language is based on S-Expressions, so the program is already represented as a hierarchical data structure
that can be loaded with the same mechanism, directly to the VM data structures. Of course I still need to
work on the compiler that would be smarter than just an assembler, but it is a good start.