How to Deploy a uv Project to Hugging Face Spaces
Hugging Face Spaces offers three SDKs: Gradio, Streamlit, and Docker. Only the Docker SDK can use a uv lockfile. The Gradio and Streamlit SDKs install from a requirements.txt and have no uv support, so the way to ship a reproducible, uv-locked Space is to build your own container with sdk: docker and run uv sync --frozen inside it.
This guide deploys a Gradio app on a transformers pipeline, but the same Dockerfile works for FastAPI, Streamlit-in-Docker, or any framework that serves HTTP.
Lay out the Space repository
A Space is a Git repository. A Docker Space needs four files at the root:
my-space/
├── README.md # YAML config block selects the Docker SDK
├── Dockerfile # builds the image HF runs
├── pyproject.toml # dependency declarations
├── uv.lock # exact resolved versions
└── app.py # your applicationThe Space’s configuration lives in the YAML front matter of README.md:
---
title: Sentiment Demo
emoji: 🤗
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
---sdk: docker tells Hugging Face to build the Dockerfile instead of installing a requirements.txt. app_port is the port Spaces routes external traffic to; 7860 is the default, so the app must listen there.
Write the Gradio app
The app loads the model once at startup and binds to 0.0.0.0:7860:
import gradio as gr
from transformers import pipeline
# Load the model once at startup, not per request.
classifier = pipeline("sentiment-analysis")
def classify(text: str):
result = classifier(text)[0]
return {result["label"]: float(result["score"])}
demo = gr.Interface(fn=classify, inputs="text", outputs="label")
if __name__ == "__main__":
# Spaces routes external traffic to app_port (7860 by default).
demo.launch(server_name="0.0.0.0", server_port=7860)Add the dependencies with uv so they land in pyproject.toml and uv.lock:
uv add gradio "transformers[torch]"Write a uv Dockerfile for the UID 1000 user
Spaces runs every container as user ID 1000, not root. Create that user before installing anything, or uv fails with permission errors when it writes the virtual environment.
FROM python:3.13-slim
# Copy the uv binary from its official image. Pin the tag for reproducibility.
COPY --from=ghcr.io/astral-sh/uv:0.11.18 /uv /uvx /bin/
# Spaces runs the container as UID 1000. Create and switch to that user.
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
PATH=/home/user/.local/bin:$PATH \
UV_COMPILE_BYTECODE=1 \
UV_LINK_MODE=copy
WORKDIR $HOME/app
# Install dependencies from the lockfile first so the layer caches
# independently of application code changes.
COPY --chown=user pyproject.toml uv.lock ./
RUN uv sync --frozen --no-install-project --no-dev
# Copy the app and complete the sync.
COPY --chown=user . .
RUN uv sync --frozen --no-dev
EXPOSE 7860
CMD ["uv", "run", "--frozen", "python", "app.py"]The two-stage uv sync is the same caching pattern as any uv Dockerfile: editing app.py reuses the cached dependency layer. --frozen installs the exact versions in uv.lock and never re-resolves on Hugging Face’s build infrastructure, which is what makes the deployed environment match your local one.
--chown=user keeps the copied files and the .venv owned by UID 1000, which is the fix for the permission errors a root-owned COPY would cause.
Push the four files to the Space repo and Hugging Face builds and runs the image.
Request GPU hardware
A new Space starts on free cpu-basic hardware. Upgrade to a GPU from the Space’s Settings tab under Hardware. Billing is per minute of Running or Starting time; the build itself is free.
To advertise a default for anyone who duplicates the Space, set suggested_hardware in README.md. It suggests a flavor but does not assign one:
suggested_hardware: t4-smallValid GPU flavors include t4-small, t4-medium, l4x1, l40sx1, a10g-small, a10g-large, a100-large, and their multi-GPU variants (l4x4, a100x8, and so on). The Nvidia T4 (16 GB) is the cheapest at $0.40/hour and fits most inference demos.
The build step has no GPU. Hugging Face builds images on CPU-only infrastructure, so torch.cuda.is_available() and nvidia-smi fail during docker build. Move every GPU call (loading a model onto cuda, allocating tensors) into the application code that runs after the container starts.
Install CUDA PyTorch in the lockfile
PyPI’s default torch wheels are CPU-only on macOS and Windows. To run on a GPU Space, route torch to a CUDA index in pyproject.toml so the locked wheels carry the CUDA runtime:
[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true
[tool.uv.sources]
torch = [{ index = "pytorch-cu128", marker = "sys_platform == 'linux'" }]Run uv lock to record the CUDA wheels, commit the updated uv.lock, and the Space build installs them. The build downloads the wheels on CPU without ever initializing CUDA, so it succeeds without a GPU. How to Install PyTorch with uv covers index routing, multi-backend extras, and ROCm in full. For the transformers stack specifically, see How to Install Hugging Face Transformers with uv.
A transformers pipeline moves the model to the GPU automatically when you pass device=0. With plain PyTorch, call model.to("cuda") in the startup code.
Cache models instead of re-downloading on every restart
A Docker Space’s disk resets to the built image on every restart, and persistent storage is no longer offered. A model downloaded at runtime is fetched again on the next cold start, which can add minutes. Bake the weights into the image so they ship in a layer that survives restarts.
The Hugging Face native way is the preload_from_hub field, which downloads named repositories during the build:
preload_from_hub:
- distilbert/distilbert-base-uncased-finetuned-sst-2-englishpreload_from_hub writes to the default cache at ~/.cache/huggingface/hub and ignores a custom HF_HOME, so avoid setting HF_HOME if you rely on it.
For finer control, download the model in a Dockerfile RUN step instead:
RUN uv run --frozen python -c \
"from transformers import pipeline; pipeline('sentiment-analysis')"This runs after uv sync, caches the weights under the UID 1000 user’s home, and bakes them into the image. Loading the same pipeline at runtime then finds the files on disk and skips the download.
Pass your Hugging Face token as a secret
Unauthenticated requests to the Hub are rate-limited, and gated or private models need a token. Add one under the Space’s Settings tab as a secret named HF_TOKEN.
At runtime the secret is an environment variable. transformers and huggingface_hub read HF_TOKEN automatically, so no code change is needed; reading it directly is os.environ.get("HF_TOKEN").
To preload a gated model during the build, mount the secret for that step only so it never lands in an image layer:
RUN --mount=type=secret,id=HF_TOKEN,mode=0444 \
HF_TOKEN=$(cat /run/secrets/HF_TOKEN) \
uv run --frozen python -c \
"from transformers import pipeline; pipeline('text-generation', model='your/gated-model')"preload_from_hub does not support private repositories, so build-time secret mounts are the path for gated weights.
Learn more
- uv: A Complete Guide covers what uv does, how fast it is, and the core workflows.
- Docker Spaces documents the Docker SDK, the UID 1000 user, and secret mounts.
- Using GPU Spaces lists hardware flavors, pricing, and per-framework CUDA notes.
- Spaces Configuration Reference is the full list of
README.mdYAML fields, includingapp_port,suggested_hardware, andpreload_from_hub. - How to deploy a uv project to AWS Lambda applies the same
uv.lockdiscipline to a serverless target.