80 lines
2.7 KiB
Markdown
80 lines
2.7 KiB
Markdown
|
X-Date: 2024-01-07T23:00:00Z
|
||
|
X-Note-Id: ed8dd390-20db-4eb0-95b6-96f495f1d5ed
|
||
|
Subject: Text to binary and back
|
||
|
X-Slug: text_to_binary_and_back
|
||
|
|
||
|
In the last couple of days I've finished an important part of the virtual machine, that
|
||
|
allows to translate objects from the binary representation in memory to a text form, and
|
||
|
back.
|
||
|
|
||
|
For example, take this data structure:
|
||
|
|
||
|
```
|
||
|
;; this is an array
|
||
|
[
|
||
|
1 2 3 ;; numbers
|
||
|
foobar ;; symbol
|
||
|
{"foo" "bar"} ;; dict with str key
|
||
|
]
|
||
|
```
|
||
|
|
||
|
If you save it into data.txt, you can convert it into the binary format:
|
||
|
|
||
|
```
|
||
|
cat data.txt | ./sd > data.bin
|
||
|
```
|
||
|
|
||
|
The `data.bin` file would contain the encoded version, which we can examine with the standard `xxd` tool:
|
||
|
|
||
|
```
|
||
|
cat data.bin | xxd
|
||
|
|
||
|
00000000: 0000 0000 0000 0000 90c8 0000 0000 0000 ................
|
||
|
00000010: 0500 0000 0000 0000 0700 0000 0000 0000 ................
|
||
|
00000020: 0100 0000 0000 0000 0700 0000 0000 0000 ................
|
||
|
00000030: 0200 0000 0000 0000 0700 0000 0000 0000 ................
|
||
|
00000040: 0300 0000 0000 0000 9968 0000 0000 0000 .........h......
|
||
|
00000050: 4000 0000 1061 0000 9588 0000 0000 0000 @....a..........
|
||
|
00000060: 4000 0000 1061 0000 0600 0000 0000 0000 @....a..........
|
||
|
00000070: 6600 0000 6f00 0000 6f00 0000 6200 0000 f...o...o...b...
|
||
|
00000080: 6100 0000 7200 0000 0100 0000 0000 0000 a...r...........
|
||
|
00000090: 91b0 0000 0000 0000 4000 0000 1061 0000 ........@....a..
|
||
|
000000a0: 91c4 0000 0000 0000 4000 0000 1061 0000 ........@....a..
|
||
|
000000b0: 0300 0000 0000 0000 6600 0000 6f00 0000 ........f...o...
|
||
|
000000c0: 6f00 0000 0300 0000 0000 0000 6200 0000 o...........b...
|
||
|
000000d0: 6100 0000 7200 0000 a...r...
|
||
|
```
|
||
|
|
||
|
And you can also decode the binary data back to the text form:
|
||
|
|
||
|
```
|
||
|
cat data.bin | ./sd -d
|
||
|
|
||
|
[1 2 3 foobar {"foo" "bar"}]
|
||
|
```
|
||
|
|
||
|
This would be exactly the same data structure we had initially, just without comments.
|
||
|
|
||
|
You can also do the same trick with bytecode produced by the assembler, since bytecode is also
|
||
|
serialized as a frozen data structure:
|
||
|
|
||
|
```
|
||
|
cat examples/factorial.asm | ./asm | ./sd -d
|
||
|
|
||
|
[[31372u32 3084u32 2956u32 246661u32 50178u32 4294913042u32 779u32 3342u32 7447u32 24u32] ["! is"]]
|
||
|
```
|
||
|
|
||
|
And we can even turn the bytecode back from the text representation to the binary, and run it:
|
||
|
|
||
|
```
|
||
|
cat examples/factorial.asm | ./asm | ./sd -d | ./sd | ./vm
|
||
|
|
||
|
15 ! is 1307674368000
|
||
|
```
|
||
|
|
||
|
|
||
|
This serialization/deserialization mechanism is important, since it serves also as a parser. My programming
|
||
|
language is based on S-Expressions, so the program is already represented as a hierarchical data structure
|
||
|
that can be loaded with the same mechanism, directly to the VM data structures. Of course I still need to
|
||
|
work on the compiler that would be smarter than just an assembler, but it is a good start.
|