From a169ee4c61e2b340523e7cf3c64425d7f79e31a8 Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Wed, 10 Jan 2024 21:12:59 +0000 Subject: [PATCH] Publish a post about serialization/deserialization --- content/posts/text_to_binary_and_back/note.md | 79 +++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 content/posts/text_to_binary_and_back/note.md diff --git a/content/posts/text_to_binary_and_back/note.md b/content/posts/text_to_binary_and_back/note.md new file mode 100644 index 0000000..d10764a --- /dev/null +++ b/content/posts/text_to_binary_and_back/note.md @@ -0,0 +1,79 @@ +X-Date: 2024-01-07T23:00:00Z +X-Note-Id: ed8dd390-20db-4eb0-95b6-96f495f1d5ed +Subject: Text to binary and back +X-Slug: text_to_binary_and_back + +In the last couple of days I've finished an important part of the virtual machine, that +allows to translate objects from the binary representation in memory to a text form, and +back. + +For example, take this data structure: + +``` +;; this is an array +[ +1 2 3 ;; numbers +foobar ;; symbol +{"foo" "bar"} ;; dict with str key +] +``` + +If you save it into data.txt, you can convert it into the binary format: + +``` +cat data.txt | ./sd > data.bin +``` + +The `data.bin` file would contain the encoded version, which we can examine with the standard `xxd` tool: + +``` +cat data.bin | xxd + +00000000: 0000 0000 0000 0000 90c8 0000 0000 0000 ................ +00000010: 0500 0000 0000 0000 0700 0000 0000 0000 ................ +00000020: 0100 0000 0000 0000 0700 0000 0000 0000 ................ +00000030: 0200 0000 0000 0000 0700 0000 0000 0000 ................ +00000040: 0300 0000 0000 0000 9968 0000 0000 0000 .........h...... +00000050: 4000 0000 1061 0000 9588 0000 0000 0000 @....a.......... +00000060: 4000 0000 1061 0000 0600 0000 0000 0000 @....a.......... +00000070: 6600 0000 6f00 0000 6f00 0000 6200 0000 f...o...o...b... +00000080: 6100 0000 7200 0000 0100 0000 0000 0000 a...r........... +00000090: 91b0 0000 0000 0000 4000 0000 1061 0000 ........@....a.. +000000a0: 91c4 0000 0000 0000 4000 0000 1061 0000 ........@....a.. +000000b0: 0300 0000 0000 0000 6600 0000 6f00 0000 ........f...o... +000000c0: 6f00 0000 0300 0000 0000 0000 6200 0000 o...........b... +000000d0: 6100 0000 7200 0000 a...r... +``` + +And you can also decode the binary data back to the text form: + +``` +cat data.bin | ./sd -d + +[1 2 3 foobar {"foo" "bar"}] +``` + +This would be exactly the same data structure we had initially, just without comments. + +You can also do the same trick with bytecode produced by the assembler, since bytecode is also +serialized as a frozen data structure: + +``` +cat examples/factorial.asm | ./asm | ./sd -d + +[[31372u32 3084u32 2956u32 246661u32 50178u32 4294913042u32 779u32 3342u32 7447u32 24u32] ["! is"]] +``` + +And we can even turn the bytecode back from the text representation to the binary, and run it: + +``` +cat examples/factorial.asm | ./asm | ./sd -d | ./sd | ./vm + +15 ! is 1307674368000 +``` + + +This serialization/deserialization mechanism is important, since it serves also as a parser. My programming +language is based on S-Expressions, so the program is already represented as a hierarchical data structure +that can be loaded with the same mechanism, directly to the VM data structures. Of course I still need to +work on the compiler that would be smarter than just an assembler, but it is a good start.