Skip to content

How to Install bitsandbytes

bitsandbytes provides 4-bit and 8-bit quantization for large language models, cutting GPU memory usage enough to run models that would otherwise not fit. Unlike PyTorch (see Why Installing GPU Python Packages Is So Complicated), bitsandbytes publishes platform-specific wheels to PyPI that bundle precompiled CUDA libraries for multiple toolkit versions. A plain pip install works on Linux, Windows, and macOS without extra index URLs.

Requirements

  • Python >= 3.10
  • PyTorch >= 2.3 (see How to Install PyTorch with uv)
  • NVIDIA GPU with compute capability 6.0+ for GPU quantization (Pascal or newer)
  • NVIDIA driver that supports CUDA 11.8 or later

bitsandbytes also supports CPU-only, AMD ROCm (preview), Intel XPU, and Apple Silicon. GPU quantization requires an NVIDIA GPU.

Install with pip or uv

Install PyTorch first, then bitsandbytes:

uv pip install torch
uv pip install bitsandbytes

The PyPI wheel ships with precompiled binaries for CUDA 11.8 through 13.0. At runtime, bitsandbytes detects the CUDA version provided by the installed PyTorch and loads the matching binary. No --index-url flag or CUDA version suffix is needed.

Important

Install PyTorch before bitsandbytes. bitsandbytes uses PyTorch’s CUDA runtime to detect the correct backend at import time. If PyTorch is missing or CPU-only, bitsandbytes falls back to CPU mode and quantization functions will raise errors.

Add to a uv project

For a uv project, add both packages as dependencies:

uv add torch bitsandbytes

If the project needs a specific CUDA build of PyTorch, configure the PyTorch index in pyproject.toml as described in How to Install PyTorch with uv. bitsandbytes itself resolves from PyPI with no extra configuration.

Install with conda

bitsandbytes is available on conda-forge, though the latest version (0.38.0) is older than the current PyPI release (0.49.x). If the project requires recent features like NF4 quantization support for newer GPU architectures, use pip or uv instead.

For projects where conda-forge’s version is acceptable:

pixi add bitsandbytes

See uv vs pixi vs conda for scientific Python for guidance on when conda-based tooling makes sense.

Verify the installation

Run this check to confirm bitsandbytes loaded the CUDA backend:

import bitsandbytes as bnb
import torch

# Should print the CUDA version bitsandbytes is using
print(bnb.cextension.BNB_BACKEND)

# Quick functional test: create a quantized linear layer
linear = bnb.nn.Linear8bitLt(256, 128, has_fp16_weights=False)
x = torch.randn(1, 256, dtype=torch.float16, device="cuda")
output = linear.to("cuda")(x)
print(f"Output shape: {output.shape}")

If the backend prints CUDA, the GPU path is active. If it prints CPU, PyTorch either lacks CUDA support or no GPU was detected.

Troubleshooting

RuntimeError when calling quantization functions

bitsandbytes imports without error even when the native CUDA library fails to load. The error surfaces later, when quantization code runs. Check that:

  1. PyTorch was installed with CUDA support (torch.cuda.is_available() returns True)
  2. The NVIDIA driver is recent enough for the installed CUDA toolkit

Wrong CUDA version detected

bitsandbytes reads the CUDA version from PyTorch, not from the system nvcc. If torch.version.cuda reports a different version than expected, reinstall PyTorch with the correct CUDA backend.

Override the detected version by setting the BNB_CUDA_VERSION environment variable:

BNB_CUDA_VERSION=128 python -c "import bitsandbytes"

libcudart.so not found

This occurs when the CUDA runtime shared library is missing from the environment. PyTorch wheels bundle their own copy of libcudart, so this usually means PyTorch was installed as CPU-only. Reinstall PyTorch with CUDA support.

Related

Handbook articles:

External links:

Last updated on

Please submit corrections and feedback...