still don't have a title

The State of Python Packaging in 2021

Every year or so, I revisit the current best practices for Python packaging. I.e. the way you’re supposed to distribute your Python packages. The main source is packaging.python.org where the official packaging guidelines are. It is worth noting that the way you’re supposed to package your Python applications is not defined by Python or its maintainers, but rather delegated to a separate entity, the Python Packaging Authority (PyPA).

PyPA

PyPA does an excellent job providing us with information, best practices and tutorials regarding Python packaging. However, there’s one thing that irritates me every single time I revisit the page and that is the misleading recommendation of their own tool pipenv.

Quoting from the tool recommendations section of the packaging guidelines:

Use Pipenv to manage library dependencies when developing Python applications. See Managing Application Dependencies for more details on using pipenv.

PyPA recommends pipenv as the standard tool for dependency management, at least since 2018. A bold statement, given that pipenv only started in 2017, so the Python community cannot have had not enough time to standardize on the workflow around that tool. There have been no releases of pipenv between 2018-11 and 2020-04, that’s 1.5 years for the standard tool. In the past, pipenv also hasn’t been shy in pushing breaking changes in a fast-paced manner.

PyPA still advertises pipenv all over the place and only mentions poetry a couple of times, although poetry seems to be the more mature product. I understand that pipenv lives under the umbrella of PyPA, but I still expect objectiveness when it comes to tool recommendation. Instead of making such claims, they should provide a list of competing tools and provide a fair feature comparison.

Distributions

You would expect exactly one distribution for Python packages, but here in Python land, we have several ones. The most popular ones being PyPI – the official one – and Anaconda. Anaconda is more geared towards data-scientists. The main selling point for Anaconda back then was that it provided pre-compiled binaries. This was especially useful for data-science related packages which depend on libatlas, -lapack, -openblas, etc. and need to be compiled for the target system. This problem has mostly been solved with the wide adoption of wheels, but you still encounter some source-only uploads to PyPI that require you to build stuff locally on pip install.

Of course there’s also the Python packages distributed by the Operating System, Debian in my case. While I was a firm believer in only using those packages provided by the OS in the very past, I moved to the opposite end of the spectrum throughout the years, and am only using the minimal packages provided by Debian to bootstrap my virtual environments (i.e. pip, setuptools and wheel). The main reason is outdated or missing libraries, which is expected – Debian cannot hope to keep up with all the upstream changes in the ecosystem, and that is by design and fine. However, with the recent upgrade of manylinux, even the pip provided by Debian/unstable was too outdated, so you basically had to pip install --upgrade pip for a while otherwise you’d end up compiling every package you’d try to install via pip.

So I’m sticking to the official PyPI distribution wherever possible. However, compared to the Debian distribution it feels immature. In my opinion, there should be compiled wheels for all packages available that need it, built and provided by PyPI. Currently, the wheels provided are the ones uploaded by the upstream maintainers. This is not enough, as they usually build wheels only for one platform. Sometimes they don’t upload wheels in the first place, relying on the users to compile during install.

Then you have manylinux, an excellent idea to create some common ground for a portable Linux build distribution. However, sometimes when a new version of manylinux is released some upstream maintainers immediately start supporting only that version, breaking a lot of systems.

A setup similar to Debian’s where the authors only do a source-upload and the wheels are compiled on PyPI infrastructure for all available platforms, is probably the way to go.

setup.py, setup.cfg, requirements.txt, Pipfile, pyproject.toml – oh my!

This is the part I’m revisiting the documentation every year, to see what’s the current way to go.

The main point of packaging your Python application is to define the package’s meta data and (build-) dependencies.

setup.py + requirements.txt

For the longest time, the setup.py and requirements.txt were (and, spoiler alert: still is) the backbone of your packaging efforts. In setup.py you define the meta data of your package, including its dependencies.

If your project is a deployable application (vs. a library) you’ll very often provide an additional requirements.txt with pinned dependencies. Usually the list of requirements is the same as defined in setup.py but with pinned versions. The reason why you avoid version pinning in setup.py is that it would interfere with other pinned dependencies from other dependencies you try to install.

setup.cfg

setup.cfg is a configuration file that is used by many standard tools in the Python ecosystem. Its format is ini-style and each tools’ configuration lives in its own stanza. Since 2016 setuptools supports configuring setup() using setup.cfg files. This was exciting news back then, however, it does not completely replace the setup.py file. While you can move most of the setup.py configuration into setup.cfg, you’ll still have to provide that file with an empty setup() in order to allow for editable pip installs. In my opinion, that makes this feature useless and I rather stick to setup.py with a properly populated setup() until that file can be completely replaced with something else.

Pipfile + Pipflie.lock

Pipfile and Pipfile.lock are supposed to replace requirements.txt some day. So far they are not supported by pip or mentioned in any PEP. I think only pipenv supports them, so I’d ignore them for now.

pyproject.toml

PEP 518 introduces the pyproject.toml file as a way to specify build requirements for your project. PEP 621 defines how to store project meta data in it.

pip and setuptools support pyproject.toml to some extent, but not to a point where it completely replaces setup.py yet. Many of Python’s standard tools allow already for configuration in pyproject.toml so it seems this file will slowly replace the setup.cfg and probably setup.py and requirements.txt as well. But we’re not there yet.

poetry has an interesting approach: it will allow you to write everything into pyproject.toml and generate a setup.py for you at build-time, so it can be uploaded to PyPI.

Ironically, Python settled for the TOML file format here, although there is currently no support for reading TOML files in Python’s standard library.

Summary

While some alternatives exist, in 2021 I still stick to setup.py and requirements.txt to define the meta data and dependencies of my projects. Regarding the tooling, pip and twine are sufficient and do their job just fine. Alternatives like pipenv and poetry exist. The scope of poetry seems to be better aligned with my expectations, and it seems the more mature project compared to pipenv but in any case I’ll ignore both of them until I revisit this issue in 2022.

Closing Thoughts

While the packaging in Python has improved a lot in the last years, I’m still somewhat put off how such a core aspect of a programming language is treated within Python. With some jealousy, I look over to the folks at Rust and how they seemed to get this aspect right from the start.

What would in my opinion improve the situation?