Running Python on Serverless
Seven handbook how-tos deploy a uv project to the cloud: AWS Lambda, AWS Fargate, Google Cloud Run, Azure Container Apps, Vercel, Modal, and Hugging Face Spaces. The platform choice feels like a cloud-vendor decision, but two things matter more: what uv brings to the serverless build (which every guide on the internet ignores), and whether the workload shape matches the platform’s execution model.
Why uv changes the serverless build story
Every “deploy Python to serverless” guide on the web assumes pip and requirements.txt. That assumption shapes the entire build: slow installs, no lockfile guarantee, and .pyc compilation deferred to cold start. uv replaces all three.
Install speed matters when caches miss. A Docker layer cache hit skips the install entirely, but a dependency change, a base-image bump, or a platform rebuild invalidates the cache and triggers a full reinstall. pip takes 30-90 seconds to install a typical web framework’s dependency tree. uv does it in 1-3 seconds. On platforms that build on every push (Vercel, Cloud Run, Modal), that gap compounds across dozens of deploys per day.
Frozen lockfiles prevent resolution drift. uv sync --frozen installs exactly what uv.lock pins and fails if the lockfile is stale. pip’s requirements.txt can use pinned versions, but nothing enforces that the pins match what was tested locally. A serverless deploy that silently resolves a newer transitive dependency is a production incident waiting for a breaking release. See How to use a uv lockfile.
Bytecode compilation moves off the cold-start path. UV_COMPILE_BYTECODE=1 during the build writes .pyc files into the image. Without it, Python compiles every imported module on the first request after a cold start. For a FastAPI app importing numpy, pandas, or torch, that adds hundreds of milliseconds to seconds. How to use uv in a Dockerfile covers the setup.
The build mechanism varies by platform. This is the biggest practical difference between platforms, and it determines how much of the uv story you control:
| Build mechanism | Platforms | What you manage |
|---|---|---|
Dockerfile with uv sync |
Lambda, Fargate, Spaces | Full control: base image, system deps, layer order |
| Cloud-native buildpack | Cloud Run, Azure Container Apps | Zero Dockerfile; the buildpack reads uv.lock (or build from a Dockerfile) |
| Native lockfile | Vercel | No container at all; Vercel reads uv.lock directly |
| Python-defined image | Modal | uv_sync() in Python; no Dockerfile, no YAML |
How the seven platforms compare
Beyond the build, the platforms differ on execution model, run-time ceiling, billing, and hardware:
| Platform | Execution model | Max run time | Scales to zero | GPU | Billing model |
|---|---|---|---|---|---|
| AWS Lambda | Event / request | 15 min | Yes | No | Per invocation + GB-seconds |
| AWS Fargate | Long-running task | Unbounded | No | No | Per vCPU-hour + GB-hour |
| Google Cloud Run | Request or always-on | 60 min | Yes | NVIDIA L4 | Per request + vCPU-seconds |
| Azure Container Apps | Request or always-on | Unbounded | Yes | NVIDIA T4, A100 | Per vCPU-second + GiB-second |
| Vercel | Request-scoped | 5 min / 800s Pro | Yes | No | Per invocation |
| Modal | Function call | 24 h | Yes | T4 to B200 | Per GPU-second or CPU-second |
| Hugging Face Spaces | Long-running app | Persistent | Sleeps when idle | Paid tiers | Free tier + paid GPU hours |
- AWS Lambda: scales to zero and bills per millisecond, but the 15-minute ceiling and no GPU rule out long or compute-heavy jobs.
- AWS Fargate: no time limit and full container control, but no scale-to-zero and no GPU. Fits always-on services.
- Google Cloud Run: scales to zero, runs a full container, accepts a uv buildpack with no Dockerfile, and offers an L4 GPU. The most flexible single option.
- Azure Container Apps: scales to zero, runs a full container, deploys in one
az containerapp up, and offers serverless T4 and A100 GPUs. The Azure counterpart to Cloud Run. - Vercel: reads
uv.locknatively with no container step. The lightest deploy path for a request-scoped API. - Modal: defines the image in Python, reaches the widest GPU range (T4 through B200), and scales to zero. Built for ML inference and batch compute.
- Hugging Face Spaces: hosts a persistent app with an optional GPU. The natural home for a model demo with a Gradio or Streamlit UI.
Match the platform to the workload
- Short request/response API or event handler: Vercel (if
uv.lockand a function signature are enough) or Lambda (if already on AWS). Cloud Run when the work needs a full container or runs past Lambda’s 15 minutes. - Always-on service or long-running job: Fargate (no time limit), or Cloud Run and Azure Container Apps (both run a full container and scale to zero between bursts, which Fargate does not).
- GPU work (PyTorch inference, model serving, batch compute): Modal for the widest GPU range and a Python-defined image, Cloud Run for a managed L4 that scales to zero, Azure Container Apps for serverless T4 and A100 GPUs, Spaces for a hosted demo.
- No Dockerfile: Vercel reads uv directly, and the Cloud Run and Azure Container Apps buildpacks build Python source. Modal builds the image in Python. If you need full control of the base image and system libraries, Lambda, Fargate, and Spaces take a Dockerfile.
When serverless is the wrong fit
Serverless trades capacity management for constraints. Some workloads don’t survive those constraints:
- Persistent connections. WebSocket servers, gRPC streams, and long-polling endpoints need a process that stays alive between requests. Scale-to-zero platforms tear down idle containers.
- Latency-sensitive with no cold-start tolerance. A cold start adds hundreds of milliseconds to seconds. If every request must respond in under 100ms, a pre-warmed VM or a container on a managed cluster (ECS on EC2, GKE, Fly.io) gives stable latency.
- Large local state. Workloads that read or write large files, maintain an in-memory cache across requests, or depend on a local filesystem between invocations need persistent storage that serverless containers don’t provide.
When the workload fits one of these patterns, a VM or managed container service costs more operational effort but avoids fighting the platform.
Learn more
- How to deploy a uv project to AWS Lambda
- How to deploy a uv project to AWS Fargate
- How to deploy a uv project to Google Cloud Run
- How to deploy a uv project to Azure Container Apps
- How to deploy a uv project to Vercel
- How to run uv on Modal
- How to deploy a uv project to Hugging Face Spaces
- How to use uv in a Dockerfile
- How to use a uv lockfile for reproducible Python environments