What is the best serverless platform for a Python project?

It depends on the workload shape. Short request/response APIs and event handlers fit AWS Lambda, Vercel, or Cloud Run, which scale to zero and bill per request. Always-on services fit AWS Fargate, Cloud Run, or Azure Container Apps. GPU inference fits Modal, Cloud Run (NVIDIA L4), Azure Container Apps (T4 and A100), or Hugging Face Spaces. The recurring tooling decisions (installing from `uv.lock`, cold starts, the image build) matter more than the vendor.

Do I need a Dockerfile to run a Python project on serverless?

Not always. Vercel reads `uv.lock` natively, Cloud Run and Azure Container Apps can build Python source with a buildpack, and Modal defines the image in Python with `uv_sync()`. AWS Lambda, AWS Fargate, and Hugging Face Spaces want a container image you build, usually from a Dockerfile. The build mechanism is one of the main differences between platforms.

How does uv help with serverless cold starts?

Setting `UV_COMPILE_BYTECODE=1` at build time compiles `.pyc` files into the image instead of paying that cost on every cold start. Installing from a frozen `uv.lock` also keeps the dependency set identical across deploys, so a build that works once keeps working. uv's install speed (10-100x faster than pip) also cuts rebuild times when a Docker layer cache miss forces a full reinstall.

Which serverless platforms support GPUs for Python?

Modal (T4 through B200), Google Cloud Run (NVIDIA L4), Azure Container Apps (NVIDIA T4 and A100 serverless GPUs), and Hugging Face Spaces (paid GPU tiers) run GPU workloads. AWS Lambda, AWS Fargate, and Vercel do not offer GPUs, so PyTorch or other CUDA workloads need one of the GPU-capable platforms.

When is serverless the wrong choice for a Python project?

Serverless is a poor fit for workloads that need persistent WebSocket connections, sub-100ms response latency with no cold-start tolerance, large local state or filesystem access between requests, or long-running processes that exceed platform time limits. A VM or managed container service (ECS on EC2, GKE, Fly.io) gives stable latency and persistent state at the cost of managing capacity.

Running Python on Serverless

by Tim Hopper · Markdown

Seven handbook how-tos deploy a uv project to the cloud: AWS Lambda, AWS Fargate, Google Cloud Run, Azure Container Apps, Vercel, Modal, and Hugging Face Spaces. The platform choice feels like a cloud-vendor decision, but two things matter more: what uv brings to the serverless build (which every guide on the internet ignores), and whether the workload shape matches the platform’s execution model.

Why uv changes the serverless build story

Every “deploy Python to serverless” guide on the web assumes pip and requirements.txt. That assumption shapes the entire build: slow installs, no lockfile guarantee, and .pyc compilation deferred to cold start. uv replaces all three.

Install speed matters when caches miss. A Docker layer cache hit skips the install entirely, but a dependency change, a base-image bump, or a platform rebuild invalidates the cache and triggers a full reinstall. pip takes 30-90 seconds to install a typical web framework’s dependency tree. uv does it in 1-3 seconds. On platforms that build on every push (Vercel, Cloud Run, Modal), that gap compounds across dozens of deploys per day.

Frozen lockfiles prevent resolution drift. uv sync --frozen installs exactly what uv.lock pins and fails if the lockfile is stale. pip’s requirements.txt can use pinned versions, but nothing enforces that the pins match what was tested locally. A serverless deploy that silently resolves a newer transitive dependency is a production incident waiting for a breaking release. See How to use a uv lockfile.

Bytecode compilation moves off the cold-start path. UV_COMPILE_BYTECODE=1 during the build writes .pyc files into the image. Without it, Python compiles every imported module on the first request after a cold start. For a FastAPI app importing numpy, pandas, or torch, that adds hundreds of milliseconds to seconds. How to use uv in a Dockerfile covers the setup.

The build mechanism varies by platform. This is the biggest practical difference between platforms, and it determines how much of the uv story you control:

Build mechanism	Platforms	What you manage
Dockerfile with `uv sync`	Lambda, Fargate, Spaces	Full control: base image, system deps, layer order
Cloud-native buildpack	Cloud Run, Azure Container Apps	Zero Dockerfile; the buildpack reads `uv.lock` (or build from a Dockerfile)
Native lockfile	Vercel	No container at all; Vercel reads `uv.lock` directly
Python-defined image	Modal	`uv_sync()` in Python; no Dockerfile, no YAML

How the seven platforms compare

Beyond the build, the platforms differ on execution model, run-time ceiling, billing, and hardware:

Platform	Execution model	Max run time	Scales to zero	GPU	Billing model
AWS Lambda	Event / request	15 min	Yes	No	Per invocation + GB-seconds
AWS Fargate	Long-running task	Unbounded	No	No	Per vCPU-hour + GB-hour
Google Cloud Run	Request or always-on	60 min	Yes	NVIDIA L4	Per request + vCPU-seconds
Azure Container Apps	Request or always-on	Unbounded	Yes	NVIDIA T4, A100	Per vCPU-second + GiB-second
Vercel	Request-scoped	5 min / 800s Pro	Yes	No	Per invocation
Modal	Function call	24 h	Yes	T4 to B200	Per GPU-second or CPU-second
Hugging Face Spaces	Long-running app	Persistent	Sleeps when idle	Paid tiers	Free tier + paid GPU hours

AWS Lambda: scales to zero and bills per millisecond, but the 15-minute ceiling and no GPU rule out long or compute-heavy jobs.
AWS Fargate: no time limit and full container control, but no scale-to-zero and no GPU. Fits always-on services.
Google Cloud Run: scales to zero, runs a full container, accepts a uv buildpack with no Dockerfile, and offers an L4 GPU. The most flexible single option.
Azure Container Apps: scales to zero, runs a full container, deploys in one az containerapp up, and offers serverless T4 and A100 GPUs. The Azure counterpart to Cloud Run.
Vercel: reads uv.lock natively with no container step. The lightest deploy path for a request-scoped API.
Modal: defines the image in Python, reaches the widest GPU range (T4 through B200), and scales to zero. Built for ML inference and batch compute.
Hugging Face Spaces: hosts a persistent app with an optional GPU. The natural home for a model demo with a Gradio or Streamlit UI.

Match the platform to the workload

Short request/response API or event handler: Vercel (if uv.lock and a function signature are enough) or Lambda (if already on AWS). Cloud Run when the work needs a full container or runs past Lambda’s 15 minutes.
Always-on service or long-running job: Fargate (no time limit), or Cloud Run and Azure Container Apps (both run a full container and scale to zero between bursts, which Fargate does not).
GPU work (PyTorch inference, model serving, batch compute): Modal for the widest GPU range and a Python-defined image, Cloud Run for a managed L4 that scales to zero, Azure Container Apps for serverless T4 and A100 GPUs, Spaces for a hosted demo.
No Dockerfile: Vercel reads uv directly, and the Cloud Run and Azure Container Apps buildpacks build Python source. Modal builds the image in Python. If you need full control of the base image and system libraries, Lambda, Fargate, and Spaces take a Dockerfile.

When serverless is the wrong fit

Serverless trades capacity management for constraints. Some workloads don’t survive those constraints:

Persistent connections. WebSocket servers, gRPC streams, and long-polling endpoints need a process that stays alive between requests. Scale-to-zero platforms tear down idle containers.
Latency-sensitive with no cold-start tolerance. A cold start adds hundreds of milliseconds to seconds. If every request must respond in under 100ms, a pre-warmed VM or a container on a managed cluster (ECS on EC2, GKE, Fly.io) gives stable latency.
Large local state. Workloads that read or write large files, maintain an in-memory cache across requests, or depend on a local filesystem between invocations need persistent storage that serverless containers don’t provide.

When the workload fits one of these patterns, a VM or managed container service costs more operational effort but avoids fighting the platform.

Learn more

Last updated on July 3, 2026

Ruff vs flake8: Which Python Linter Should You Use?Sampling vs deterministic profilers: which should I use?

Please submit corrections and feedback...