What Are Wheel Variants?
A default NumPy wheel for x86_64 has to run on every x86-64 CPU that has ever shipped, with the baseline instruction set fixed when AMD64 was defined in 2003. Every SIMD instruction Intel and AMD have added since then is off-limits to the installer, because a wheel filename has no place to declare “needs AVX2” or “needs SSE4”. On scientific workloads, Ralf Gommers of Quansight put the performance left on the table at 10x to 20x.
Wheel variants are the proposed fix. PEP 817 and PEP 825 extend the wheel format so a package can ship several hardware-specific builds under one name and let the installer pick one at install time.
Three tags cannot describe modern hardware
A wheel filename today carries three pieces of compatibility metadata: Python version, ABI, and platform (the py3-none-manylinux_2_17_x86_64 portion). Those tags tell the installer which interpreter and which OS family the wheel targets. They say nothing about which CPU microarchitecture the build was optimized for, whether it expects a GPU, or which CUDA version it links against.
When a wheel wants to run on “any x86-64 Linux,” it has to use the lowest-common-denominator CPU baseline. For manylinux, that baseline is roughly the original AMD64 instruction set from the early 2000s. Builds that want to use AVX2, AVX-512, or ARM SVE either ship a separate package with a mangled name (cudf-cu12, cudf-cu13) or force the user to configure a custom index URL (the PyTorch approach).
NumPy ships multiple CPU builds and picks one at runtime
NumPy has worked around the format by engineering its way past it. The project compiles its SIMD-heavy source multiple times, once per CPU family (Haswell, Skylake, and so on), merges the builds into a single extension module, and runs CPU-feature detection at import time to dispatch to the right variant. Sustained engineering contributions from Intel and ARM keep the dispatcher current on each side.
SciPy, scikit-learn, pandas, and Pillow have not adopted the same trick in their distributed wheels, even though the underlying SIMD code often exists. The engineering cost of maintaining a fat-binary dispatcher is too high for most projects to absorb, which is why NumPy’s approach has never spread. The fix has to live in the packaging format itself.
Variants move the selection into the installer
A variant wheel carries the same filename as today plus a label segment. An example from PEP 817: numpy-2.3.2-cp313-cp313t-musllinux_1_2_x86_64-x86_64_v3.whl. The x86_64_v3 label corresponds to the x86-64-v3 microarchitecture level, which roughly means “AVX2 and FMA are available.”
Inside the wheel, a *.dist-info/variant.json file spells out what the label demands as structured properties. Each property is a three-part tuple of namespace, feature, and value:
x86_64 :: level :: v3
nvidia :: cuda_version_lower_bound :: 12.8On the package index side, a companion file lists every variant published for a version ({name}-{version}-variants.json), so an installer can see the full menu before downloading any wheel.
Provider plugins detect the host
The installer does not hard-code a table of CPU features or GPU detection logic. PEP 817 defines a plugin interface: each namespace maps to a variant provider plugin that reports what the local machine supports for that namespace. An NVIDIA plugin queries the driver and returns the maximum CUDA version. A CPU plugin inspects /proc/cpuinfo or the equivalent and returns the highest x86-64 level the machine can run. The resolver scores each candidate wheel against those reports and installs the best match.
The design keeps the installer generic: no “blessed list” of platform tags that needs maintenance every time a new GPU generation ships. It also trades one thing away. Plugins execute Python code at resolve time, which is in tension with reproducible installs driven by a lockfile. PEP 825 closes that gap by adding a [packages.variant_json] field to the pylock.toml format, so a lockfile records the selected variant alongside the version and hashes. Static opt-in configuration for running non-vendored plugins is deferred to a later specification.
What variants unblock
Two scenarios motivate the whole effort:
- CPU microarchitecture. A project can publish one baseline
x86_64wheel plusx86_64_v3andx86_64_v4variants built with AVX2 or AVX-512. Users on recent Intel or AMD hardware install the fast build automatically. Users on older machines still get a working wheel. - GPU and CUDA. A PyTorch release can publish variants for CUDA 12.6, 12.8, 13.0, and CPU-only under one package name.
pip install torchoruv add torchpicks the right binary with no index URL flag and no package-name suffix.
Astral published a variant-enabled build of uv on August 13, 2025, separate from the mainline release, with a worked PyTorch example.
Where the proposal stands
Both PEPs are in Draft status as of April 2026. PEP 817 was created on December 10, 2025 and describes the full variant model, including how installers discover capabilities through provider plugins and score wheels against them. PEP 825 was created on February 17, 2026 and specifies the wheel-format extensions: the filename label, the *.dist-info/variant.json file, the index-level {name}-{version}-variants.json file, and the namespace :: feature :: value property structure. The same 11 authors are listed on both, with contributors associated with NVIDIA, Quansight, Astral, Meta, and PyPA projects.
Prototype work has proceeded in parallel. Forked branches of pip, uv, Warehouse (PyPI), setuptools, scikit-build-core, and the packaging library exist to demo the full flow end to end, with upstreaming as the goal once the specifications settle.
Note
PEP 817 and PEP 825 are in Draft status. Draft PEPs can change substantially or be withdrawn. Check the PEP index for current status.