# How to Install Hugging Face Transformers with uv


The [Transformers](https://github.com/huggingface/transformers) library from Hugging Face provides access to thousands of pretrained models for text, image, audio, and multimodal tasks. Installing it with [uv](https://pydevtools.com/handbook/reference/uv.md) is straightforward for CPU inference but requires PyTorch index configuration for GPU acceleration, the same CUDA routing pattern covered in [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md).

This guide covers three install paths: CPU-only inference, GPU training and inference with CUDA, and quantized model loading with `accelerate` and `bitsandbytes`.

## Install for CPU inference

For tasks that run on CPU (sentiment analysis, text generation with small models, embeddings), add Transformers to your project:

```bash
uv add transformers
```

This installs Transformers and its core dependencies (huggingface-hub, tokenizers, safetensors) but not PyTorch. Most inference and training features require a deep learning backend, so install PyTorch alongside it:

```bash
uv add "transformers[torch]"
```

The `[torch]` extra pulls in a CPU-compatible PyTorch build from PyPI. Verify the install:

```bash
uv run python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('uv is fast'))"
```

The output should show a label and confidence score:

```
[{'label': 'POSITIVE', 'score': 0.9789}]
```

## Install with GPU support

GPU acceleration requires PyTorch built against the right CUDA version. PyPI's default PyTorch wheels are CPU-only on Windows and macOS. On Linux, PyPI carries CUDA 12.8 wheels as of PyTorch 2.9.1, but your system may need a different CUDA version.

### Configure CUDA in pyproject.toml

Add a PyTorch CUDA index and route GPU packages to it. This example uses CUDA 12.8:

```toml {filename="pyproject.toml"}
[project]
name = "my-ml-project"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "transformers[torch]",
]

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = [
  { index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]
torchvision = [
  { index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]
```

Then lock and sync:

```bash
uv lock
uv sync
```

The platform markers restrict CUDA builds to Linux and Windows. macOS falls back to PyPI's CPU wheels because CUDA builds are not available for macOS. See [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md) for the full configuration reference, including multi-backend extras and ROCm support.

### Verify GPU access

```bash
uv run python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
```

This should print `True` followed by your GPU name. If it prints `False`, check that your NVIDIA driver is installed (`nvidia-smi`) and that the PyTorch CUDA version matches your driver's supported CUDA version.

### Install GPU support without a project

For one-off experimentation without a `pyproject.toml`, use `uv pip` with `--torch-backend`:

```bash
uv venv --python 3.12 --seed --managed-python
source .venv/bin/activate
uv pip install "transformers[torch]" --torch-backend=auto
```

The `--torch-backend=auto` flag detects your GPU hardware and selects the matching PyTorch CUDA index. Valid values include `auto`, `cpu`, `cu118`, `cu126`, `cu128`, `cu130`, `rocm6`, and `xpu`.

> [!IMPORTANT]
> `--torch-backend` only works with `uv pip` commands. It does not work with `uv lock`, `uv sync`, or `uv run`. For project-level workflows, configure the PyTorch index in `pyproject.toml` as shown above.

## Install extras for quantization and distributed training

Loading large models (7B+ parameters) on consumer GPUs requires quantization. The `accelerate` library is already a transitive dependency of `transformers[torch]`, so adding `bitsandbytes` is the only extra step:

```bash
uv add bitsandbytes
```

With these installed, load a quantized model:

```python {filename="quantized_inference.py"}
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    quantization_config=quantization_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

inputs = tokenizer("Explain virtual environments in one sentence.", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

> [!NOTE]
> `bitsandbytes` requires a Linux system with an NVIDIA GPU. It does not support macOS or Windows natively.

`accelerate` enables multi-GPU and distributed training with Transformers' `Trainer` class, even without quantization.

## Choose the right extras

Transformers ships several [optional dependency groups](https://pydevtools.com/handbook/explanation/what-are-optional-dependencies-and-dependency-groups.md) that pull in libraries for specific use cases:

| Extra | What it adds | When to use it |
|---|---|---|
| `transformers[torch]` | PyTorch | Most NLP, vision, and generative tasks |
| `transformers[vision]` | Pillow | Image classification, object detection, image generation |
| `transformers[audio]` | librosa, soundfile | Speech recognition, audio classification |
| `transformers[sentencepiece]` | sentencepiece | Multilingual models (mBART, XLM-RoBERTa) |
| `transformers[video]` | av, decord | Video understanding models |

Extras can be combined: `uv add "transformers[torch,vision]"` installs both PyTorch and Pillow.

## Learn more

- [uv: A Complete Guide](https://pydevtools.com/handbook/explanation/uv-complete-guide.md) covers what uv does, how fast it is, the core workflows, and recent releases.
- [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md) covers CUDA index routing, multi-backend extras, and `--torch-backend` in detail
- [Why Installing GPU Python Packages Is So Complicated](https://pydevtools.com/handbook/explanation/installing-cuda-python-packages.md) explains why CUDA packages need special index configuration
- [Transformers installation docs](https://huggingface.co/docs/transformers/en/installation) for the official guide
- [Transformers on PyPI](https://pypi.org/project/transformers/) for the full list of optional extras
