How do I install bitsandbytes?

Install PyTorch first, then bitsandbytes: `uv pip install torch && uv pip install bitsandbytes` (or the equivalent `pip install` commands). The PyPI wheel ships precompiled binaries for CUDA 11.8 through 13.0 and detects the CUDA version from the installed PyTorch at runtime. No `--index-url` flag or CUDA suffix is needed.

Why does bitsandbytes need PyTorch installed first?

bitsandbytes reads `torch.version.cuda` at import time to pick the matching native binary. Installing PyTorch first ensures the CUDA backend is detected correctly. If PyTorch is missing or CPU-only, bitsandbytes falls back to CPU mode and quantization functions raise errors at runtime rather than at install time.

How do I add bitsandbytes to a uv project?

Run `uv add torch bitsandbytes` to declare both as dependencies in `pyproject.toml`. If the project needs a specific CUDA build of PyTorch, configure the PyTorch index with a `[[tool.uv.index]]` entry pointing at the matching `download.pytorch.org/whl/cuXXX` URL. bitsandbytes itself resolves from PyPI without extra configuration.

How do I check that bitsandbytes loaded the CUDA backend?

Run `python -c "import bitsandbytes as bnb; print(bnb.cextension.BNB_BACKEND)"`. If it prints `CUDA`, the GPU path is active. If it prints `CPU`, PyTorch was installed without CUDA support or no GPU was detected. Verify PyTorch sees CUDA with `python -c "import torch; print(torch.cuda.is_available())"`.

How to Install bitsandbytes

by Tim Hopper · Markdown

Scientific Python

bitsandbytes provides 4-bit and 8-bit quantization for large language models, cutting GPU memory usage enough to run models that would otherwise not fit. Unlike PyTorch (see Why Installing GPU Python Packages Is So Complicated), bitsandbytes publishes platform-specific wheels to PyPI that bundle precompiled CUDA libraries for multiple toolkit versions. A plain pip install works on Linux, Windows, and macOS without extra index URLs.

Requirements

Python >= 3.10
PyTorch >= 2.3 (see How to Install PyTorch with uv)
NVIDIA GPU with compute capability 6.0+ for GPU quantization (Pascal or newer)
NVIDIA driver that supports CUDA 11.8 or later

bitsandbytes also supports CPU-only, AMD ROCm (preview), Intel XPU, and Apple Silicon. GPU quantization requires an NVIDIA GPU.

Install with pip or uv

Install PyTorch first, then bitsandbytes:

uv pip install torch
uv pip install bitsandbytes

The PyPI wheel ships with precompiled binaries for CUDA 11.8 through 13.0. At runtime, bitsandbytes detects the CUDA version provided by the installed PyTorch and loads the matching binary. No --index-url flag or CUDA version suffix is needed.

Important

Install PyTorch before bitsandbytes. bitsandbytes uses PyTorch’s CUDA runtime to detect the correct backend at import time. If PyTorch is missing or CPU-only, bitsandbytes falls back to CPU mode and quantization functions will raise errors.

Add to a uv project

For a uv project, add both packages as dependencies:

uv add torch bitsandbytes

If the project needs a specific CUDA build of PyTorch, configure the PyTorch index in pyproject.toml as described in How to Install PyTorch with uv. bitsandbytes itself resolves from PyPI with no extra configuration.

Install with conda

bitsandbytes is available on conda-forge with the same version as PyPI (0.49.2 as of May 2026). Conda resolves the CUDA dependency alongside PyTorch, which avoids the install-order requirement that pip and uv have.

pixi add bitsandbytes

See uv vs pixi vs conda for scientific Python for guidance on when conda-based tooling makes sense.

Verify the installation

Run this check to confirm bitsandbytes loaded the CUDA backend:

import bitsandbytes as bnb
import torch

# Should print the CUDA version bitsandbytes is using
print(bnb.cextension.BNB_BACKEND)

# Quick functional test: create a quantized linear layer
linear = bnb.nn.Linear8bitLt(256, 128, has_fp16_weights=False)
x = torch.randn(1, 256, dtype=torch.float16, device="cuda")
output = linear.to("cuda")(x)
print(f"Output shape: {output.shape}")

If the backend prints CUDA, the GPU path is active. If it prints CPU, PyTorch either lacks CUDA support or no GPU was detected.

Troubleshooting

`RuntimeError` when calling quantization functions

bitsandbytes imports without error even when the native CUDA library fails to load. The error surfaces later, when quantization code runs. Check that:

PyTorch was installed with CUDA support (torch.cuda.is_available() returns True)
The NVIDIA driver is recent enough for the installed CUDA toolkit

Wrong CUDA version detected

bitsandbytes reads the CUDA version from PyTorch, not from the system nvcc. If torch.version.cuda reports a different version than expected, reinstall PyTorch with the correct CUDA backend.

Override the detected version by setting the BNB_CUDA_VERSION environment variable:

BNB_CUDA_VERSION=128 python -c "import bitsandbytes"

`libcudart.so` not found

This occurs when the CUDA runtime shared library is missing from the environment. PyTorch wheels bundle their own copy of libcudart, so this usually means PyTorch was installed as CPU-only. Reinstall PyTorch with CUDA support.

Handbook articles:

External links:

Last updated on May 20, 2026

How to Host Your Own Python Package Index How to Install DeepSpeed

Please submit corrections and feedback...