# How to Deploy a uv Project to Hugging Face Spaces


Hugging Face Spaces offers three SDKs: Gradio, Streamlit, and Docker. Only the Docker SDK can use a [uv](https://pydevtools.com/handbook/reference/uv.md) [lockfile](https://pydevtools.com/handbook/explanation/what-is-a-lock-file.md). The Gradio and Streamlit SDKs install from a `requirements.txt` and have no uv support, so the way to ship a reproducible, uv-locked Space is to build your own container with `sdk: docker` and run `uv sync --frozen` inside it.

This guide deploys a Gradio app on a transformers pipeline, but the same Dockerfile works for FastAPI, Streamlit-in-Docker, or any framework that serves HTTP.

## Lay out the Space repository

A Space is a Git repository. A Docker Space needs four files at the root:

```text
my-space/
├── README.md          # YAML config block selects the Docker SDK
├── Dockerfile         # builds the image HF runs
├── pyproject.toml     # dependency declarations
├── uv.lock            # exact resolved versions
└── app.py             # your application
```

The Space's configuration lives in the YAML front matter of `README.md`:

```yaml {filename="README.md"}
---
title: Sentiment Demo
emoji: 🤗
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
---
```

`sdk: docker` tells Hugging Face to build the `Dockerfile` instead of installing a `requirements.txt`. `app_port` is the port Spaces routes external traffic to; 7860 is the default, so the app must listen there.

## Write the Gradio app

The app loads the model once at startup and binds to `0.0.0.0:7860`:

```python {filename="app.py"}
import gradio as gr
from transformers import pipeline

# Load the model once at startup, not per request.
classifier = pipeline("sentiment-analysis")


def classify(text: str):
    result = classifier(text)[0]
    return {result["label"]: float(result["score"])}


demo = gr.Interface(fn=classify, inputs="text", outputs="label")

if __name__ == "__main__":
    # Spaces routes external traffic to app_port (7860 by default).
    demo.launch(server_name="0.0.0.0", server_port=7860)
```

Add the dependencies with uv so they land in `pyproject.toml` and `uv.lock`:

```bash
uv add gradio "transformers[torch]"
```

## Write a uv Dockerfile for the UID 1000 user

Spaces runs every container as user ID 1000, not root. Create that user before installing anything, or uv fails with permission errors when it writes the virtual environment.

```dockerfile {filename="Dockerfile"}
FROM python:3.13-slim

# Copy the uv binary from its official image. Pin the tag for reproducibility.
COPY --from=ghcr.io/astral-sh/uv:0.11.18 /uv /uvx /bin/

# Spaces runs the container as UID 1000. Create and switch to that user.
RUN useradd -m -u 1000 user
USER user

ENV HOME=/home/user \
    PATH=/home/user/.local/bin:$PATH \
    UV_COMPILE_BYTECODE=1 \
    UV_LINK_MODE=copy

WORKDIR $HOME/app

# Install dependencies from the lockfile first so the layer caches
# independently of application code changes.
COPY --chown=user pyproject.toml uv.lock ./
RUN uv sync --frozen --no-install-project --no-dev

# Copy the app and complete the sync.
COPY --chown=user . .
RUN uv sync --frozen --no-dev

EXPOSE 7860
CMD ["uv", "run", "--frozen", "python", "app.py"]
```

The two-stage `uv sync` is the same caching pattern as any [uv Dockerfile](https://pydevtools.com/handbook/how-to/how-to-use-uv-in-a-dockerfile.md): editing `app.py` reuses the cached dependency layer. `--frozen` installs the exact versions in `uv.lock` and never re-resolves on Hugging Face's build infrastructure, which is what makes the deployed environment match your local one.

`--chown=user` keeps the copied files and the `.venv` owned by UID 1000, which is the fix for the permission errors a root-owned `COPY` would cause.

Push the four files to the Space repo and Hugging Face builds and runs the image.

## Request GPU hardware

A new Space starts on free `cpu-basic` hardware. Upgrade to a GPU from the Space's **Settings** tab under **Hardware**. Billing is per minute of `Running` or `Starting` time; the build itself is free.

To advertise a default for anyone who duplicates the Space, set `suggested_hardware` in `README.md`. It suggests a flavor but does not assign one:

```yaml {filename="README.md"}
suggested_hardware: t4-small
```

Valid GPU flavors include `t4-small`, `t4-medium`, `l4x1`, `l40sx1`, `a10g-small`, `a10g-large`, `a100-large`, and their multi-GPU variants (`l4x4`, `a100x8`, and so on). The Nvidia T4 (16 GB) is the cheapest at $0.40/hour and fits most inference demos.

The build step has no GPU. Hugging Face builds images on CPU-only infrastructure, so `torch.cuda.is_available()` and `nvidia-smi` fail during `docker build`. Move every GPU call (loading a model onto `cuda`, allocating tensors) into the application code that runs after the container starts.

## Install CUDA PyTorch in the lockfile

PyPI's default `torch` wheels are CPU-only on macOS and Windows. To run on a GPU Space, route `torch` to a CUDA index in `pyproject.toml` so the locked wheels carry the CUDA runtime:

```toml {filename="pyproject.toml"}
[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = [{ index = "pytorch-cu128", marker = "sys_platform == 'linux'" }]
```

Run `uv lock` to record the CUDA wheels, commit the updated `uv.lock`, and the Space build installs them. The build downloads the wheels on CPU without ever initializing CUDA, so it succeeds without a GPU. [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md) covers index routing, multi-backend extras, and ROCm in full. For the transformers stack specifically, see [How to Install Hugging Face Transformers with uv](https://pydevtools.com/handbook/how-to/how-to-install-hugging-face-transformers-with-uv.md).

A transformers `pipeline` moves the model to the GPU automatically when you pass `device=0`. With plain PyTorch, call `model.to("cuda")` in the startup code.

## Cache models instead of re-downloading on every restart

A Docker Space's disk resets to the built image on every restart, and persistent storage is no longer offered. A model downloaded at runtime is fetched again on the next cold start, which can add minutes. Bake the weights into the image so they ship in a layer that survives restarts.

The Hugging Face native way is the `preload_from_hub` field, which downloads named repositories during the build:

```yaml {filename="README.md"}
preload_from_hub:
  - distilbert/distilbert-base-uncased-finetuned-sst-2-english
```

`preload_from_hub` writes to the default cache at `~/.cache/huggingface/hub` and ignores a custom `HF_HOME`, so avoid setting `HF_HOME` if you rely on it.

For finer control, download the model in a Dockerfile `RUN` step instead:

```dockerfile {filename="Dockerfile"}
RUN uv run --frozen python -c \
    "from transformers import pipeline; pipeline('sentiment-analysis')"
```

This runs after `uv sync`, caches the weights under the UID 1000 user's home, and bakes them into the image. Loading the same pipeline at runtime then finds the files on disk and skips the download.

## Pass your Hugging Face token as a secret

Unauthenticated requests to the Hub are rate-limited, and gated or private models need a token. Add one under the Space's **Settings** tab as a secret named `HF_TOKEN`.

At runtime the secret is an environment variable. `transformers` and `huggingface_hub` read `HF_TOKEN` automatically, so no code change is needed; reading it directly is `os.environ.get("HF_TOKEN")`.

To preload a gated model during the build, mount the secret for that step only so it never lands in an image layer:

```dockerfile {filename="Dockerfile"}
RUN --mount=type=secret,id=HF_TOKEN,mode=0444 \
    HF_TOKEN=$(cat /run/secrets/HF_TOKEN) \
    uv run --frozen python -c \
    "from transformers import pipeline; pipeline('text-generation', model='your/gated-model')"
```

`preload_from_hub` does not support private repositories, so build-time secret mounts are the path for gated weights.

## Learn more

- [uv: A Complete Guide](https://pydevtools.com/handbook/explanation/uv-complete-guide.md) covers what uv does, how fast it is, and the core workflows.
- [Docker Spaces](https://huggingface.co/docs/hub/spaces-sdks-docker) documents the Docker SDK, the UID 1000 user, and secret mounts.
- [Using GPU Spaces](https://huggingface.co/docs/hub/spaces-gpus) lists hardware flavors, pricing, and per-framework CUDA notes.
- [Spaces Configuration Reference](https://huggingface.co/docs/hub/spaces-config-reference) is the full list of `README.md` YAML fields, including `app_port`, `suggested_hardware`, and `preload_from_hub`.
- [How to deploy a uv project to AWS Lambda](https://pydevtools.com/handbook/how-to/how-to-deploy-a-uv-project-to-aws-lambda.md) applies the same `uv.lock` discipline to a serverless target.
