Add a post on calling convention

This commit is contained in:
Konstantin Nazarov 2023-08-28 22:57:58 +01:00
parent 4b1817e1ae
commit c4d2281704
Signed by: knazarov
GPG key ID: 4CFE0A42FA409C22

View file

@ -0,0 +1,66 @@
X-Date: 2023-08-28T23:30:00Z
X-Note-Id: 4204588c-e5bd-4214-a08f-b2cd77029d4b
Subject: Calling convention dilemma
X-Slug: calling_convention_dilemma
Statically typed languages usually have knowledge of how they should pass arguments to a function, because
that function's signature is known in advance. If you look at the x86 C calling convention for example, you'd
see that either parameters are passed through registers (for small values like integers or pointers), or
on the stack (for larger values).
Even if you don't know what exact function you're calling (in case of function pointers), the prototype
of that function tells the compiler everything it needs to know to produce the platform-specific machine code.
For dynamic languages, it is different. The prototype is not known in advance, and so you'd have to rely
on a higher-level construct. For example, you always pass one parameter, which is an array. Or two parameters,
one of which is an array and another is a hash table (for Python-like keyword arguments).
I've suddenly found myself somewhere in the middle. I design my programming language VM to be specifically
made for dynamic languages. But still, the architecture that it uses has been derived from a RISC CPU.
It has a stack, but opcodes deal exclusively with registers, and you need to call explicit load/store.
Having registers means that it would be nice to be able to pass parameters in them, in case the number
of arguments to the function is short enough. And it would help with optimizing tail calls as well
(less shuffling of the stack).
The problem here can be demonstrated on a "print" function. Imagine that it's just a built-in that accepts
an arbitrary number of arguments. In a made-up assembly, it would look something like this:
```
;; Load two constant strings
;; into registers r0 and r1
loadc r0, hello
loadc r1, world
;; How to detect arg count?
call print
ret
.const
hello: "Hello"
world: "World"
```
In this example, the `print` function has no way to know that it should use registers r0 and r1. Even if
the calling convention allows passing arguments through registers. It just has no way to know that there
are two arguments (it could've been more). And even if we are not talking about a "variadic" function,
we may just have a pointer to it and thus no way to inspect it.
What I'm leaning towards in this case is to embed the argument count into the low-level "virtual CPU"
calling convention. So instead of `call print`, you'd have `call 2, print` which would have an "immediate"
value encoded into the opcode. When this opcode is executed, it would set up a new call frame and
put the information about the number of parameters into the frame itself (along with the return
address and a link to the previous call frame). `print` can then look at the frame and deduce the correct
number.
The benefit of this approach is that the caller always knows the number of arguments exactly. And the
callee may then take up to a certain pre-defined number of arguments from the registers, and the
rest from the stack.
You may be wondering -- why go to all these lengths when a purely stack-based virtual machine would
be much simpler and probably already solves these problems? Well, the most straightforward answer
to this would be that I want to make the VM a good target for code generation. Reading the code
generated for a register-based machine is a lot easier than for a stack-based one. Same for debugging.
But at the end of the day, it's just fun to do. So let's see how it goes.