How do I install DeepSpeed with pip or uv?

Install PyTorch first, then run `uv pip install deepspeed --no-build-isolation` (or `pip install deepspeed --no-build-isolation`). DeepSpeed's `setup.py` imports torch at the top level, so `--no-build-isolation` is required. The install does not compile CUDA ops; those compile on first use through PyTorch's JIT extension system.

How do I add DeepSpeed to a uv project?

Declare `deepspeed` and `torch` under `[project].dependencies`, then add `[tool.uv.extra-build-dependencies]` with `deepspeed = [{ requirement = "torch", match-runtime = true }]` so uv injects torch into the isolated build with the same version the project resolves at runtime. To pre-compile ops, add `[tool.uv.extra-build-variables]` with `deepspeed = { DS_BUILD_OPS = "1" }`.

Why does DeepSpeed install fail with "CUDA_HOME does not exist"?

The `setup.py` checks for `CUDA_HOME` at metadata generation time, before any ops are compiled. Set the variable to the CUDA toolkit root: `export CUDA_HOME=/usr/local/cuda`. NVIDIA's CUDA devel Docker images set this automatically, but slim Python images do not. Also confirm `nvcc --version` works from the shell.

How to Install DeepSpeed

Q: How do I pre-compile DeepSpeed CUDA ops at install time?

Set `DS_BUILD_OPS=1` to compile all compatible ops during installation: `DS_BUILD_OPS=1 uv pip install deepspeed --no-build-isolation`. This requires `nvcc` on `PATH` and a working C++ compiler, and takes several minutes. To compile only specific ops, use targeted variables like `DS_BUILD_FUSED_ADAM=1` or `DS_BUILD_CPU_ADAM=1`.

by Tim Hopper · Markdown

Scientific Python

DeepSpeed publishes only a source distribution on PyPI, with no prebuilt wheels. The setup.py requires both PyTorch and a CUDA toolkit to generate metadata, so even a basic install needs CUDA_HOME set and nvcc on PATH. Once installed, individual ops (fused Adam, CPU offloading, transformer kernels) are compiled on first use through PyTorch’s JIT C++ extension system. See Why Installing GPU Python Packages Is So Complicated for background.

Requirements

Platform: Linux (x86_64). Windows has partial support through WSL2. No macOS GPU support.
Software: PyTorch already installed, a C++ compiler (gcc or g++), the CUDA toolkit with nvcc on PATH, and CUDA_HOME set to the toolkit root (e.g. /usr/local/cuda).
System libraries: libaio-dev is required for the async I/O op used by ZeRO-Infinity and NVMe offloading. Install it with apt install libaio-dev on Debian/Ubuntu.

Install from PyPI

DeepSpeed’s setup.py imports torch at the top level, so PyTorch must be present before installation. The --no-build-isolation flag tells the installer to use the current environment’s torch instead of creating a clean build environment:

uv pip install deepspeed --no-build-isolation

This installs the Python package without compiling any CUDA kernels. Ops are compiled at first use via JIT, which adds a one-time delay (seconds to minutes depending on the op) the first time DeepSpeed runs a training job.

Pre-compile ops at install time

To avoid JIT compilation delays at runtime, set DS_BUILD_OPS=1 to compile all compatible ops during installation:

DS_BUILD_OPS=1 uv pip install deepspeed --no-build-isolation

This requires nvcc on PATH and a working C++ compiler. The build takes several minutes.

To compile only specific ops, use individual environment variables instead:

Variable	Op
`DS_BUILD_CPU_ADAM`	CPU Adam optimizer
`DS_BUILD_FUSED_ADAM`	Fused Adam (CUDA)
`DS_BUILD_AIO`	Async I/O for NVMe offload
`DS_BUILD_TRANSFORMER_INFERENCE`	Transformer inference kernels
`DS_BUILD_SPARSE_ATTN`	Sparse attention

Set any of these to 1 to pre-compile that op. For example, to compile only the fused Adam optimizer:

DS_BUILD_FUSED_ADAM=1 pip install deepspeed --no-build-isolation

Add to a uv project

For projects managed with uv using uv add and uv sync, use extra-build-dependencies to inject torch into the isolated build environment. The match-runtime = true option ensures the build uses the same torch version the project resolves at runtime:

[project]
dependencies = ["deepspeed", "torch"]

[tool.uv.extra-build-dependencies]
deepspeed = [{ requirement = "torch", match-runtime = true }]

Then run uv sync as normal. uv handles build isolation and torch injection automatically.

To pre-compile ops during the build, pass environment variables with extra-build-variables:

[tool.uv.extra-build-variables]
deepspeed = { DS_BUILD_OPS = "1" }

Install with conda-forge or pixi

DeepSpeed is available on conda-forge, though the version may lag behind PyPI. The conda-forge build handles CUDA toolkit dependencies through the solver:

pixi add deepspeed

For more on when conda-based tools are the better choice for GPU workloads, see uv vs pixi vs conda for Scientific Python.

Verify the installation

After installing, confirm DeepSpeed loads and can report on the build environment:

python -c "import deepspeed; print(deepspeed.__version__)"
ds_report

ds_report prints a table showing which ops are installed (pre-compiled) versus available for JIT compilation. If a required system library is missing, the report flags it.

Troubleshooting

CUDA_HOME does not exist, unable to compile CUDA op(s) during install. The setup.py checks for CUDA_HOME at metadata generation time, before any ops are compiled. Set the environment variable to point to your CUDA toolkit root: export CUDA_HOME=/usr/local/cuda. If using a Docker image, the NVIDIA CUDA devel images set this automatically, but slim Python images do not.

ModuleNotFoundError: No module named 'torch' during install. PyTorch must be installed before DeepSpeed. The setup.py imports torch at the top level. Install PyTorch first, then retry with --no-build-isolation.

RuntimeError: ninja is not available at runtime. DeepSpeed’s JIT compilation uses ninja as its build backend. Install it with pip install ninja or apt install ninja-build.

libaio.h: No such file or directory when building the async I/O op. Install the development headers: apt install libaio-dev on Debian/Ubuntu, or yum install libaio-devel on RHEL/CentOS.

CUDA version mismatch errors. The CUDA toolkit version used to compile ops must be compatible with the CUDA version PyTorch was built against. Check python -c "import torch; print(torch.version.cuda)" and ensure nvcc --version reports a compatible version.

error: invalid command 'bdist_wheel' during install. The wheel package is missing from the environment. Run pip install wheel first, then retry. This happens on minimal base images that don’t ship wheel by default.

DS_BUILD_OPS=1 fails on a machine without a GPU. Pre-compilation requires CUDA headers and a GPU-compatible toolchain even if no physical GPU is present. On CPU-only machines, skip DS_BUILD_OPS and let ops JIT-compile on the GPU machine at runtime.

Handbook articles:

Why Installing GPU Python Packages Is So Complicated explains the wheel format limitations that affect DeepSpeed packaging
How to Install PyTorch with uv covers getting PyTorch installed before adding DeepSpeed
uv vs pixi vs conda for Scientific Python compares tooling choices for GPU workloads

External resources:

DeepSpeed GitHub repository for documentation and issue tracker
deepspeed on PyPI (source distributions only)
DeepSpeed installation guide for the official docs

Last updated on May 13, 2026

How to Install bitsandbytes How to Install Flash-Attention

Please submit corrections and feedback...