Set up a data science project with uv
This tutorial sets up a data analysis project with uv so that every dependency is pinned, notebooks run in the right environment, and a collaborator can reproduce your setup with a single command.
Prerequisites
Install uv following the installation guide. No separate Python install is required.
Create the project
uv init weather_analysis
cd weather_analysisThis creates a project directory with a pyproject.toml, main.py, and a README.md. The pyproject.toml stores all project metadata and dependencies:
[project]
name = "weather-analysis"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = []Note
The requires-python value depends on which Python interpreter uv finds on your system. You may see a different version bound.
Add data science dependencies
uv add pandas matplotlibuv resolves compatible versions, installs them into an isolated virtual environment, and writes a uv.lock file that pins every transitive dependency. The lockfile guarantees that anyone cloning this project gets the same package versions.
The pyproject.toml now lists the direct dependencies:
dependencies = [
"matplotlib>=3.10.8",
"pandas>=3.0.2",
]The exact version bounds will reflect whichever releases are current when you run the command.
Tip
Commit both pyproject.toml and uv.lock to version control. The lockfile pins exact versions of every transitive dependency, so collaborators get identical environments with uv sync.
Tip
Some scientific packages, such as certain CUDA toolkits and domain-specific Fortran libraries, are easier to install through conda channels than PyPI. If your project depends on packages like these, consider pixi or conda instead.
Add Jupyter as a dev dependency
Jupyter is a tool for interactive exploration, not a runtime dependency of the analysis code. Keep it in a dev group:
uv add --dev jupyterThis adds Jupyter under the [dependency-groups] section in pyproject.toml:
[dependency-groups]
dev = [
"jupyter>=1.1.1",
]When deploying the analysis as a script or scheduled job, exclude dev dependencies with uv sync --no-dev to get a leaner environment.
Launch Jupyter Lab from the project directory:
uv run jupyter labuv ensures the notebook kernel uses the project’s virtual environment, so every import resolves against your pinned dependencies. See How to Run a Jupyter Notebook with uv for more options.
Set up the project layout
Create directories for data and notebooks:
mkdir -p data notebooksCreate sample data
Add a sample CSV at data/weather.csv:
date,city,temp_high,temp_low,precipitation_mm,humidity_pct
2025-01-01,Portland,8,2,12.5,82
2025-01-02,Portland,7,1,0.0,65
2025-01-03,Portland,9,3,8.3,78
2025-01-04,Portland,6,-1,15.2,88
2025-01-05,Portland,10,4,0.0,60
2025-01-01,Phoenix,18,5,0.0,25
2025-01-02,Phoenix,20,7,0.0,22
2025-01-03,Phoenix,22,8,0.0,20
2025-01-04,Phoenix,19,6,2.1,35
2025-01-05,Phoenix,21,7,0.0,23Write the analysis script
Replace the contents of main.py with:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from pathlib import Path
matplotlib.use("Agg")
def load_weather_data(path):
df = pd.read_csv(path, parse_dates=["date"])
df["temp_range"] = df["temp_high"] - df["temp_low"]
return df
def summarize_by_city(df):
return df.groupby("city").agg(
avg_high=("temp_high", "mean"),
avg_low=("temp_low", "mean"),
total_precip=("precipitation_mm", "sum"),
avg_humidity=("humidity_pct", "mean"),
).round(1)
def plot_temperature_comparison(df, output_path):
fig, ax = plt.subplots(figsize=(8, 4))
for city, group in df.groupby("city"):
ax.plot(group["date"], group["temp_high"], marker="o", label=f"{city} high")
ax.plot(group["date"], group["temp_low"], marker="s", label=f"{city} low",
linestyle="--", alpha=0.6)
ax.set_ylabel("Temperature (°C)")
ax.set_title("Daily Temperatures by City")
ax.legend()
fig.tight_layout()
fig.savefig(output_path, dpi=150)
print(f"Chart saved to {output_path}")
plt.close(fig)
def main():
data_dir = Path(__file__).parent / "data"
df = load_weather_data(data_dir / "weather.csv")
summary = summarize_by_city(df)
print("Weather Summary by City:")
print(summary)
print()
plot_temperature_comparison(df, data_dir / "temperatures.png")
if __name__ == "__main__":
main()Run the analysis
uv run main.pyExpected output:
Weather Summary by City:
avg_high avg_low total_precip avg_humidity
city
Phoenix 20.0 6.6 2.1 25.0
Portland 8.0 1.8 36.0 74.6
Chart saved to data/temperatures.pngExplore in a notebook
With Jupyter running (uv run jupyter lab), create a new notebook in the notebooks/ directory. Every cell can import from the same environment:
import pandas as pd
from pathlib import Path
data_dir = Path("..") / "data"
df = pd.read_csv(data_dir / "weather.csv", parse_dates=["date"])
df.describe()No kernel configuration is needed. Because you launched Jupyter with uv run, the notebook kernel uses the project’s virtual environment automatically.
Add testing
Data analysis code benefits from tests as much as any other software. Add pytest as a dev dependency:
uv add --dev pytestCreate a test file at test_main.py:
import pandas as pd
from main import load_weather_data, summarize_by_city
from pathlib import Path
def test_load_weather_data():
path = Path(__file__).parent / "data" / "weather.csv"
df = load_weather_data(path)
assert "temp_range" in df.columns
assert len(df) == 10
def test_summarize_by_city():
df = pd.DataFrame({
"city": ["A", "A", "B"],
"temp_high": [20, 22, 10],
"temp_low": [10, 12, 5],
"precipitation_mm": [0.0, 5.0, 10.0],
"humidity_pct": [50, 60, 70],
})
summary = summarize_by_city(df)
assert summary.loc["A", "avg_high"] == 21.0
assert summary.loc["B", "total_precip"] == 10.0Run the tests:
uv run pytestSee the pytest tutorial for more on testing Python projects.
Pin the Python version
Data science teams often need a consistent Python version across all contributors. Pin the version for this project:
uv python pin 3.13This creates (or updates) the .python-version file in the project root. When anyone runs uv sync or uv run in this directory, uv installs and uses Python 3.13, even if their system has a different version. The pinned version must satisfy the requires-python constraint in pyproject.toml. See How to Change the Python Version of a uv Project for details.
Final project structure
-
- .python-version
- pyproject.toml
- uv.lock
- main.py
- test_main.py
- README.md
-
- weather.csv
-
Reproduce the environment
Anyone cloning this project can recreate the exact environment with one command:
uv syncThis reads the lockfile and installs the pinned versions of every dependency. No manual version matching, no stale requirements files.
Next steps
- How to Run a Jupyter Notebook with uv for more Jupyter workflows
- How to Install PyTorch with uv for GPU-accelerated machine learning
- How to Install RAPIDS with uv for GPU-accelerated data processing
- Set Up a GPU Data Science Project with pixi for projects with non-PyPI dependencies like CUDA toolkits
- uv vs pixi vs conda for Scientific Python for choosing between uv, pixi, and conda
- Set up GitHub Actions for a Python project with uv to run tests in CI