How to Install Hugging Face Transformers with uv
The Transformers library from Hugging Face provides access to thousands of pretrained models for text, image, audio, and multimodal tasks. Installing it with uv is straightforward for CPU inference but requires PyTorch index configuration for GPU acceleration, the same CUDA routing pattern covered in How to Install PyTorch with uv.
This guide covers three install paths: CPU-only inference, GPU training and inference with CUDA, and quantized model loading with accelerate and bitsandbytes.
Install for CPU inference
For tasks that run on CPU (sentiment analysis, text generation with small models, embeddings), add Transformers to your project:
uv add transformersThis installs Transformers and its core dependencies (huggingface-hub, tokenizers, safetensors) but not PyTorch. Most inference and training features require a deep learning backend, so install PyTorch alongside it:
uv add "transformers[torch]"The [torch] extra pulls in a CPU-compatible PyTorch build from PyPI. Verify the install:
uv run python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('uv is fast'))"The output should show a label and confidence score:
[{'label': 'POSITIVE', 'score': 0.9789}]Install with GPU support
GPU acceleration requires PyTorch built against the right CUDA version. PyPI’s default PyTorch wheels are CPU-only on Windows and macOS. On Linux, PyPI carries CUDA 12.8 wheels as of PyTorch 2.9.1, but your system may need a different CUDA version.
Configure CUDA in pyproject.toml
Add a PyTorch CUDA index and route GPU packages to it. This example uses CUDA 12.8:
[project]
name = "my-ml-project"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"transformers[torch]",
]
[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true
[tool.uv.sources]
torch = [
{ index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]
torchvision = [
{ index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]Then lock and sync:
uv lock
uv syncThe platform markers restrict CUDA builds to Linux and Windows. macOS falls back to PyPI’s CPU wheels because CUDA builds are not available for macOS. See How to Install PyTorch with uv for the full configuration reference, including multi-backend extras and ROCm support.
Verify GPU access
uv run python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"This should print True followed by your GPU name. If it prints False, check that your NVIDIA driver is installed (nvidia-smi) and that the PyTorch CUDA version matches your driver’s supported CUDA version.
Install GPU support without a project
For one-off experimentation without a pyproject.toml, use uv pip with --torch-backend:
uv venv --python 3.12 --seed --managed-python
source .venv/bin/activate
uv pip install "transformers[torch]" --torch-backend=autoThe --torch-backend=auto flag detects your GPU hardware and selects the matching PyTorch CUDA index. Valid values include auto, cpu, cu118, cu126, cu128, cu130, rocm6, and xpu.
Important
--torch-backend only works with uv pip commands. It does not work with uv lock, uv sync, or uv run. For project-level workflows, configure the PyTorch index in pyproject.toml as shown above.
Install extras for quantization and distributed training
Loading large models (7B+ parameters) on consumer GPUs requires quantization. The accelerate library is already a transitive dependency of transformers[torch], so adding bitsandbytes is the only extra step:
uv add bitsandbytesWith these installed, load a quantized model:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=quantization_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
inputs = tokenizer("Explain virtual environments in one sentence.", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Note
bitsandbytes requires a Linux system with an NVIDIA GPU. It does not support macOS or Windows natively.
accelerate enables multi-GPU and distributed training with Transformers’ Trainer class, even without quantization.
Choose the right extras
Transformers ships several optional dependency groups that pull in libraries for specific use cases:
| Extra | What it adds | When to use it |
|---|---|---|
transformers[torch] |
PyTorch | Most NLP, vision, and generative tasks |
transformers[vision] |
Pillow | Image classification, object detection, image generation |
transformers[audio] |
librosa, soundfile | Speech recognition, audio classification |
transformers[sentencepiece] |
sentencepiece | Multilingual models (mBART, XLM-RoBERTa) |
transformers[video] |
av, decord | Video understanding models |
Extras can be combined: uv add "transformers[torch,vision]" installs both PyTorch and Pillow.
Learn more
- How to Install PyTorch with uv covers CUDA index routing, multi-backend extras, and
--torch-backendin detail - Why Installing GPU Python Packages Is So Complicated explains why CUDA packages need special index configuration
- Transformers installation docs for the official guide
- Transformers on PyPI for the full list of optional extras