knazarov.com/content/posts/calling_convention_dilemma/note.md

X-Date: 2023-08-28T23:30:00Z
X-Note-Id: 4204588c-e5bd-4214-a08f-b2cd77029d4b
Subject: Calling convention dilemma
X-Slug: calling_convention_dilemma

Statically typed languages usually have knowledge of how they should pass arguments to a function, because
that function's signature is known in advance. If you look at the x86 C calling convention for example, you'd
see that either parameters are passed through registers (for small values like integers or pointers), or
on the stack (for larger values).

Even if you don't know what exact function you're calling (in case of function pointers), the prototype
of that function tells the compiler everything it needs to know to produce the platform-specific machine code.

For dynamic languages, it is different. The prototype is not known in advance, and so you'd have to rely
on a higher-level construct. For example, you always pass one parameter, which is an array. Or two parameters,
one of which is an array and another is a hash table (for Python-like keyword arguments).

I've suddenly found myself somewhere in the middle. I design my programming language VM to be specifically
made for dynamic languages. But still, the architecture that it uses has been derived from a RISC CPU.
It has a stack, but opcodes deal exclusively with registers, and you need to call explicit load/store.

Having registers means that it would be nice to be able to pass parameters in them, in case the number
of arguments to the function is short enough. And it would help with optimizing tail calls as well
(less shuffling of the stack).

The problem here can be demonstrated on a "print" function. Imagine that it's just a built-in that accepts
an arbitrary number of arguments. In a made-up assembly, it would look something like this:

```
;; Load two constant strings
;; into registers r0 and r1
loadc r0, hello
loadc r1, world

;; How to detect arg count?
call print

ret

.const
    hello: "Hello"
    world: "World"
```

In this example, the `print` function has no way to know that it should use registers r0 and r1. Even if
the calling convention allows passing arguments through registers. It just has no way to know that there
are two arguments (it could've been more). And even if we are not talking about a "variadic" function,
we may just have a pointer to it and thus no way to inspect it.

What I'm leaning towards in this case is to embed the argument count into the low-level "virtual CPU"
calling convention. So instead of `call print`, you'd have `call 2, print` which would have an "immediate"
value encoded into the opcode. When this opcode is executed, it would set up a new call frame and
put the information about the number of parameters into the frame itself (along with the return
address and a link to the previous call frame). `print` can then look at the frame and deduce the correct
number.

The benefit of this approach is that the caller always knows the number of arguments exactly. And the
callee may then take up to a certain pre-defined number of arguments from the registers, and the
rest from the stack.

You may be wondering -- why go to all these lengths when a purely stack-based virtual machine would
be much simpler and probably already solves these problems? Well, the most straightforward answer
to this would be that I want to make the VM a good target for code generation. Reading the code
generated for a register-based machine is a lot easier than for a stack-based one. Same for debugging.

But at the end of the day, it's just fun to do. So let's see how it goes.
Add a post on calling convention 2023-08-28 21:57:58 +00:00			`X-Date: 2023-08-28T23:30:00Z`
			`X-Note-Id: 4204588c-e5bd-4214-a08f-b2cd77029d4b`
			`Subject: Calling convention dilemma`
			`X-Slug: calling_convention_dilemma`

			`Statically typed languages usually have knowledge of how they should pass arguments to a function, because`
			`that function's signature is known in advance. If you look at the x86 C calling convention for example, you'd`
			`see that either parameters are passed through registers (for small values like integers or pointers), or`
			`on the stack (for larger values).`

			`Even if you don't know what exact function you're calling (in case of function pointers), the prototype`
			`of that function tells the compiler everything it needs to know to produce the platform-specific machine code.`

			`For dynamic languages, it is different. The prototype is not known in advance, and so you'd have to rely`
			`on a higher-level construct. For example, you always pass one parameter, which is an array. Or two parameters,`
			`one of which is an array and another is a hash table (for Python-like keyword arguments).`

			`I've suddenly found myself somewhere in the middle. I design my programming language VM to be specifically`
			`made for dynamic languages. But still, the architecture that it uses has been derived from a RISC CPU.`
			`It has a stack, but opcodes deal exclusively with registers, and you need to call explicit load/store.`

			`Having registers means that it would be nice to be able to pass parameters in them, in case the number`
			`of arguments to the function is short enough. And it would help with optimizing tail calls as well`
			`(less shuffling of the stack).`

			`The problem here can be demonstrated on a "print" function. Imagine that it's just a built-in that accepts`
			`an arbitrary number of arguments. In a made-up assembly, it would look something like this:`

			```
			`;; Load two constant strings`
			`;; into registers r0 and r1`
			`loadc r0, hello`
			`loadc r1, world`

			`;; How to detect arg count?`
			`call print`

			`ret`

			`.const`
			`hello: "Hello"`
			`world: "World"`
			```

			In this example, the `print` function has no way to know that it should use registers r0 and r1. Even if
			`the calling convention allows passing arguments through registers. It just has no way to know that there`
			`are two arguments (it could've been more). And even if we are not talking about a "variadic" function,`
			`we may just have a pointer to it and thus no way to inspect it.`

			`What I'm leaning towards in this case is to embed the argument count into the low-level "virtual CPU"`
			calling convention. So instead of `call print`, you'd have `call 2, print` which would have an "immediate"
			`value encoded into the opcode. When this opcode is executed, it would set up a new call frame and`
			`put the information about the number of parameters into the frame itself (along with the return`
			address and a link to the previous call frame). `print` can then look at the frame and deduce the correct
			`number.`

			`The benefit of this approach is that the caller always knows the number of arguments exactly. And the`
			`callee may then take up to a certain pre-defined number of arguments from the registers, and the`
			`rest from the stack.`

			`You may be wondering -- why go to all these lengths when a purely stack-based virtual machine would`
			`be much simpler and probably already solves these problems? Well, the most straightforward answer`
			`to this would be that I want to make the VM a good target for code generation. Reading the code`
			`generated for a register-based machine is a lot easier than for a stack-based one. Same for debugging.`

			`But at the end of the day, it's just fun to do. So let's see how it goes.`