From a1b8b6cf1836f0eeb5ed4f5f30e2d1ab377c9279 Mon Sep 17 00:00:00 2001 From: Konstantin Nazarov Date: Sat, 7 Oct 2023 23:55:10 +0100 Subject: [PATCH] Publish an article about static python interpreter --- .../note.md | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 content/posts/statically_linked_python_interpreter/note.md diff --git a/content/posts/statically_linked_python_interpreter/note.md b/content/posts/statically_linked_python_interpreter/note.md new file mode 100644 index 0000000..61845ef --- /dev/null +++ b/content/posts/statically_linked_python_interpreter/note.md @@ -0,0 +1,125 @@ +X-Date: 2023-10-07T23:50:00Z +X-Note-Id: 4aa1fd3d-ff26-4662-ba20-4e8b5c9a3ad7 +Subject: Statically linked Python interpreter +X-Slug: statically_linked_python_interpreter + +This would be helpful for you if you want to run a Python program on a variety of Linux systems without +modifying/repackaging it. For example, if you wrote a system administration or orchestration tool and +just want it to work everywhere. + +So far, the only feasible option to do this was to write your tool in Go or Rust, which can create +real redistributable binaries, which has been one of their killer features. This is why many sysadmin +tools and CLIs are written in Go. + +In some cases however, you may not want to introduce another language to your stack (especially if +you're already familiar with Python). + +## Quickstart + +If you just want to play with it, run a script from my [static-python](https://git.sr.ht/~knazarov/static-python) +repository. You'd get a Python binary that you can just drop on any Linux box and it will work. + +For an explanation, read along. + +## What about the official instructions? + +There's an article on the Python wiki called [Building Python Statically](https://wiki.python.org/moin/BuildStatically). +It gives the right direction, but it is outdated. You'd need more than that to build the latest version of Python. + +## How does it work + +Python itself is written in C, and so the first step is to get yourself a toolchain (compiler and a set of libraries) +suitable to be embedded into the binary. Default glibc (which is normally shipped on Linux) is not meant to be used +this way. So, you'd need something like [Musl libc](https://musl.libc.org/). Thanks to Alpine Linux which uses it +by default, a lot of software has been fixed to work well with it. Alpile Linux itself links with Musl libc dynamically, +but it doesn't make a difference for compatibility. + +Then, you'd need to compile all standard libraries that Python depends on statically as well. This includes sqlite, +ncurses, bzip2, and others. Instead of .so (dynamic library) files, you'd get .a (static library) files, and a set +of accompanying headers. + +Then comes the difficult part of persuading Python to link with them. Currently, there are certain targets where +Python is built statically, like WASM where it would be loaded into the browser. This is often used to build a web +experimentation environment like the interactive console on [python.org](https://www.python.org/). + +Unfortunately, some of the functionality that makes parts of Python link statically together is explicitly gated +behind WASM flags. This is what we'd need to change. + +First, if you look into [Modules/Setup.stdlib.in](https://github.com/python/cpython/blob/main/Modules/Setup.stdlib.in), +you'd see that there's a configuration option `*@MODULE_BUILDTYPE@*`, which controls whether the standard modules should +be built as shared or static libraries. The wiki page on Static Linking recommends manually modifying the generated +files, but this is not needed in the latest version. What you need is to do this: + +```sh +MODULE_BUILDTYPE=static ./configure +``` + +Passing `MODULE_BUILDTYPE=static` will switch all modules to be built statically. If you do this, the build will fail +further down the line. This is because some modules will still be built dynamically. `Setup.stdlib.in` even mentions +this: + +``` +# Some testing modules MUST be built as shared libraries. +*shared* +@MODULE__TESTIMPORTMULTIPLE_TRUE@_testimportmultiple _testimportmultiple.c +@MODULE__TESTMULTIPHASE_TRUE@_testmultiphase _testmultiphase.c +@MODULE__TESTMULTIPHASE_TRUE@_testsinglephase _testsinglephase.c +@MODULE__CTYPES_TEST_TRUE@_ctypes_test _ctypes/_ctypes_test.c + +# Limited API template modules; must be built as shared modules. +@MODULE_XXLIMITED_TRUE@xxlimited xxlimited.c +@MODULE_XXLIMITED_35_TRUE@xxlimited_35 xxlimited_35.c +``` + + +This means that we have to disable those. Fortunately, they don't seem essential. You'd likely lose comatibility +with some of the modules on PyPi (especially ones that use 2to3), but that should not be too many of them, as Python2 +has been deprecated. + +The testing modules can be disabled with a configuration option `--disable-test-modules`, which already exists. +However, the last 2 cannot. In case of compilation to WASM, they are disabled automatically, as the configure +script detects the lack of `dlopen()`. But for a static binary, `dlopen()` is still there, so it keeps them. +And there is no dedicated switch. + +This is why I've created a small patch to the configure script: [staticbuild.patch](https://git.sr.ht/~knazarov/static-python/tree/master/item/staticbuild.patch) +This script introduces an additional option `--disable-xxlimited-modules` which acts as an explicit instruction +to not build and link those 2 modules. + +In the end, this is the command line you'd use: + +``` +MODULE_BUILDTYPE=static ./configure --disable-test-modules --disable-xxlimited-modules +``` + +And after typing `make`, you'd get your statically linked Python. I'm not quite sure why this is not fixed in upstream yet. +Maybe I should contribute a patch. + +## NixOS static build environment for Python + +This brings us to another interesting topic. As I said previously, setting up a static toolchain is not a trivial task. +Doing it manually is very time consuming and not very reproducible. + +Recently I've learned that NixOS has support for cross-compilation, including cross-compiling to the same platform but +with a static toolchain. You can read more about it [here](https://nix.dev/tutorials/cross-compilation#developer-environment-with-a-cross-compiler). +Essentially, it gives you a compiler and a vast number of packages already prepared properly as dependencies for your +statically linked project. + +This makes it very easy to maintain a static build environment, and is why I've implemented the python build script with +Nix. As of now, the standard Nix recipe for compiling Python doesn't cross-compile to static musl libc, but I plan to +contribute my patch there, so in the future you would just be able to grab the binary directly from Nix. + + +## Further advice + +Just getting a static Python binary is probably not enough for you to comfortably run your software. You'd need to package +all your code to one distributable archive and ship it with the static Python binary. It can be done with one of these tools: + +- [pex](https://pex.readthedocs.io), a packer for your entire virtualenv to a single executable archive (python binary will be external) +- [pyinstaller](https://pyinstaller.org), which can pack your code to an executable archive, and include the python binary along with it + +Covering these tools goes out of scope of this article, but I'm just mentioning them if you want to take this further. + +You may wonder why I didn't just use pyinstaller if it bundles Python binary together with the code. This is because it will still +depend on a couple of system libraries. In the pyinstaller docs they say it explicitly: you'd have to build a package for every +major distribution this way. But if you use a static Python, you don't have to. So use my code in combination with pyinstaller, +they are not mutually exclusive.