Skip to content

How Python Package Formats Evolved: From tar.gz to .whl

In 2000, installing a third-party Python package meant downloading a tarball, extracting it, and running python setup.py install. There was no dependency resolution, no uninstall command, no record of what got installed where. Removing a package meant deleting files by hand and hoping none were missed.

Twenty-five years later, uv pip install numpy fetches a pre-built binary wheel and drops it into place in under a second. The journey between those two moments is a story of each generation solving one problem while inadvertently creating the next.

distutils and the tarball era

Python’s first packaging system, distutils, shipped with Python 1.6 in 2000. It gave package authors a standard way to describe their code and a standard command to install it. The distribution format was the sdist: a gzipped tarball of source code plus a setup.py script that told distutils how to build and install.

For pure-Python packages, this worked well enough. For packages with C extensions, it worked only if the user had a compatible compiler, the right headers, and enough patience. There was no metadata separate from the code itself, no way to declare dependencies, and no concept of uninstalling.

setuptools, easy_install, and the egg

In 2004, Philip Eby created setuptools to fill the gaps distutils left open. Setuptools introduced dependency declarations, automatic downloading via easy_install, and a new distribution format: the egg.

Eggs were ZIP archives containing both code and metadata, dropped directly into site-packages. For the first time, a Python package carried machine-readable metadata about its dependencies. But eggs also introduced new problems. They couldn’t be cleanly uninstalled. Two competing formats (.egg files and .egg-info directories) created confusion. And easy_install had a habit of silently installing packages in ways that surprised users.

As Guido van Rossum put it in a 2012 panel discussion: “Nothing ever happens in the packaging world. And then someone comes out with a terrible hack and suddenly everybody adopts it and then progress is being stalled for seven years again.”

pip replaces easy_install

In 2008, Ian Bicking created pip as a replacement for easy_install. Pip introduced requirements.txt for pinning dependencies, proper uninstall support, and clearer error messages when things went wrong. By 2011, pip had become the de facto installer.

But pip still installed packages the same way setuptools did: by downloading an sdist and running setup.py. Every pip install was a build step. Installing NumPy meant compiling Fortran and C code on the user’s machine. Installing lxml meant having libxml2 development headers available. A failed build produced pages of compiler output that most Python developers had no idea how to read.

The format itself was the bottleneck. As long as distribution meant “source code plus build instructions,” installation would remain slow, fragile, and platform-dependent.

Wheels arrive pre-built

In 2012, PEP 427 introduced the wheel format. A wheel (.whl file) is a ZIP archive containing pre-built code and standardized metadata. Unlike sdists, wheels require no build step at install time. The installer extracts files into the right locations and records what it installed.

This solved the core problem: installation became fast, reproducible, and independent of the build environment. A wheel built on a CI server installed identically on a developer’s laptop.

For pure-Python packages, adoption was easy. For packages with compiled extensions, adoption stalled. A wheel built on Ubuntu wouldn’t work on CentOS because the two distributions shipped different versions of system libraries. Building cross-platform wheels required infrastructure that most package authors didn’t have.

manylinux makes wheels portable

The breakthrough for compiled packages came in 2016 with PEP 513, which defined the manylinux1 platform tag. The idea was pragmatic: define a minimal set of system libraries that nearly every Linux distribution provides, build against those, and tag the resulting wheel so installers know it’s portable. Later PEPs (PEP 571, PEP 599, PEP 600) updated the baseline as older distributions fell out of use.

The cibuildwheel project automated the process of building wheels across Python versions, operating systems, and architectures in CI. Package authors who once needed custom build infrastructure could add a single GitHub Actions workflow and produce wheels for dozens of platform combinations.

In 2015, most packages on PyPI were sdist-only. By 2020, the most-downloaded packages nearly all shipped wheels. Install times for packages like NumPy, pandas, and cryptography dropped from minutes to seconds.

Where things stand

Today, uv and pip both prefer wheels when available and fall back to sdists when they’re not. The wheel format has become the standard way to distribute Python packages, and sdists serve as a universal fallback for platforms without pre-built binaries.

The 25-year arc from distutils to wheels follows a pattern: each generation of tooling solved a real problem (build standardization, dependency metadata, uninstall support, pre-built binaries, cross-platform portability) while leaving other problems for the next generation.

The format problem is largely settled. The remaining packaging friction lives in the tooling layer above it: project management, dependency resolution, environment handling, and the proliferation of tools that each handle a different slice of the workflow. That story is still being written.

Learn More

Last updated on

Please submit corrections and feedback...