The State of Python Packaging in 2021
Every year or so, I revisit the current best practices for Python packaging. I.e. the way you’re supposed to distribute your Python packages. The main source is packaging.python.org where the official packaging guidelines are. It is worth noting that the way you’re supposed to package your Python applications is not defined by Python or its maintainers, but rather delegated to a separate entity, the Python Packaging Authority (PyPA).
PyPA
PyPA does an excellent job providing us with information, best practices and tutorials regarding Python packaging. However, there’s one thing that irritates me every single time I revisit the page and that is the misleading recommendation of their own tool pipenv.
Quoting from the tool recommendations section of the packaging guidelines:
Use Pipenv to manage library dependencies when developing Python applications. See Managing Application Dependencies for more details on using pipenv.
PyPA recommends pipenv as the standard tool for dependency
management, at least since 2018. A bold statement, given that
pipenv
only started in 2017, so the Python community cannot have had not
enough time to standardize on the workflow around that tool. There have been
no releases of pipenv
between 2018-11 and 2020-04, that’s 1.5 years for the
standard tool. In the past, pipenv
also hasn’t been shy in pushing
breaking changes in a fast-paced manner.
PyPA still advertises pipenv
all over the place and only mentions poetry
a couple of times, although poetry
seems to be the more mature product. I
understand that pipenv
lives under the umbrella of PyPA, but I still expect
objectiveness when it comes to tool recommendation. Instead of making such
claims, they should provide a list of competing tools and provide a fair
feature comparison.
Distributions
You would expect exactly one distribution for Python packages, but here in
Python land, we have several ones. The most popular ones being PyPI – the
official one – and Anaconda. Anaconda is more geared towards
data-scientists. The main selling point for Anaconda back then was that it
provided pre-compiled binaries. This was especially useful for data-science
related packages which depend on libatlas, -lapack, -openblas, etc. and need
to be compiled for the target system. This problem has mostly been solved with
the wide adoption of wheels, but you still encounter some source-only
uploads to PyPI that require you to build stuff locally on pip install
.
Of course there’s also the Python packages distributed by the Operating
System, Debian in my case. While I was a firm believer in only using those
packages provided by the OS in the very past, I moved to the opposite end of
the spectrum throughout the years, and am only using the minimal packages
provided by Debian to bootstrap my virtual environments (i.e. pip
,
setuptools
and wheel
). The main reason is outdated or missing libraries,
which is expected – Debian cannot hope to keep up with all the upstream
changes in the ecosystem, and that is by design and fine. However, with the
recent upgrade of manylinux, even the pip
provided by Debian/unstable
was too outdated, so you basically had to pip install --upgrade pip
for a
while otherwise you’d end up compiling every package you’d try to install via
pip
.
So I’m sticking to the official PyPI distribution wherever possible. However, compared to the Debian distribution it feels immature. In my opinion, there should be compiled wheels for all packages available that need it, built and provided by PyPI. Currently, the wheels provided are the ones uploaded by the upstream maintainers. This is not enough, as they usually build wheels only for one platform. Sometimes they don’t upload wheels in the first place, relying on the users to compile during install.
Then you have manylinux, an excellent idea to create some common ground for a portable Linux build distribution. However, sometimes when a new version of manylinux is released some upstream maintainers immediately start supporting only that version, breaking a lot of systems.
A setup similar to Debian’s where the authors only do a source-upload and the wheels are compiled on PyPI infrastructure for all available platforms, is probably the way to go.
setup.py, setup.cfg, requirements.txt, Pipfile, pyproject.toml – oh my!
This is the part I’m revisiting the documentation every year, to see what’s the current way to go.
The main point of packaging your Python application is to define the package’s meta data and (build-) dependencies.
setup.py + requirements.txt
For the longest time, the setup.py
and requirements.txt
were (and, spoiler
alert: still is) the backbone of your packaging efforts. In setup.py
you
define the meta data of your package, including its dependencies.
If your project is a deployable application (vs. a library) you’ll very often
provide an additional requirements.txt
with pinned dependencies. Usually
the list of requirements is the same as defined in setup.py
but with pinned
versions. The reason why you avoid version pinning in setup.py
is that it
would interfere with other pinned dependencies from other dependencies you try
to install.
setup.cfg
setup.cfg
is a configuration file that is used by many standard tools in the
Python ecosystem. Its format is ini-style and each tools’ configuration lives
in its own stanza. Since 2016 setuptools supports configuring
setup()
using setup.cfg
files. This was exciting news back then, however,
it does not completely replace the setup.py
file. While you can move most of
the setup.py
configuration into setup.cfg
, you’ll still have to provide
that file with an empty setup()
in order to allow for editable pip
installs. In my opinion, that makes this feature useless and I rather stick to
setup.py
with a properly populated setup()
until that file can be
completely replaced with something else.
Pipfile + Pipflie.lock
Pipfile
and Pipfile.lock
are supposed to replace
requirements.txt
some day. So far they are not supported by pip
or
mentioned in any PEP. I think only pipenv
supports them, so I’d ignore
them for now.
pyproject.toml
PEP 518 introduces the pyproject.toml
file as a way to specify
build requirements for your project. PEP 621 defines how to store
project meta data in it.
pip
and setuptools
support pyproject.toml
to some extent, but not to a
point where it completely replaces setup.py
yet. Many of Python’s standard
tools allow already for configuration in pyproject.toml
so it seems this
file will slowly replace the setup.cfg
and probably setup.py
and
requirements.txt
as well. But we’re not there yet.
poetry
has an interesting approach: it will allow you to write everything
into pyproject.toml
and generate a setup.py
for you at build-time, so it
can be uploaded to PyPI.
Ironically, Python settled for the TOML file format here, although there is currently no support for reading TOML files in Python’s standard library.
Summary
While some alternatives exist, in 2021 I still stick to setup.py
and
requirements.txt
to define the meta data and dependencies of my projects.
Regarding the tooling, pip
and twine
are sufficient and do their job just
fine. Alternatives like pipenv
and poetry
exist. The scope of poetry
seems to be better aligned with my expectations, and it seems the more mature
project compared to pipenv
but in any case I’ll ignore both of them until I
revisit this issue in 2022.
Closing Thoughts
While the packaging in Python has improved a lot in the last years, I’m still somewhat put off how such a core aspect of a programming language is treated within Python. With some jealousy, I look over to the folks at Rust and how they seemed to get this aspect right from the start.
What would in my opinion improve the situation?
- Source only uploads and building of weels for all platforms on PyPI
infrastructure – this way we could have wheels everywhere and remove the
need to compile anything in
pip install
- Standard tooling:
pip
has been and still is the tool of choice for packaging in Python. For some time now, you also needtwine
in order to upload your packages.setup.py upload
still exists, but hasn’t worked for months on my machines. It would be great to have something that improves the virtualenv handling and dependency management. We do have some tools with overlapping use-cases, likepoetry
andpipenv
.pipenv
is heavily advertised and an actual PyPA project, but it feels immature in terms of scope, release history (and emojis!) compared topoetry
.poetry
is gaining a lot of traction, but it is apparently not receiving much love from PyPA, which brings me to: - Unbiased tool recommendations. I don’t understand why PyPA is trying so hard
to make us believe
pipenv
would be the standard tool for Python packaging. Instead of making such claims, please provide a list of competitors and provide a fair feature comparison. PyPA is providing great packaging tutorials and is a valuable source of information around this topic. But when it comes to the tool recommendations, I do challenge PyPA’s objectiveness.