Skip to content

Set up a data science project with uv

This tutorial sets up a data analysis project with uv so that every dependency is pinned, notebooks run in the right environment, and a collaborator can reproduce your setup with a single command.

Prerequisites

Install uv following the installation guide. No separate Python install is required.

Create the project

uv init weather_analysis
cd weather_analysis

This creates a project directory with a pyproject.toml, main.py, and a README.md. The pyproject.toml stores all project metadata and dependencies:

[project]
name = "weather-analysis"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = []

Note

The requires-python value depends on which Python interpreter uv finds on your system. You may see a different version bound.

Add data science dependencies

uv add pandas matplotlib

uv resolves compatible versions, installs them into an isolated virtual environment, and writes a uv.lock file that pins every transitive dependency. The lockfile guarantees that anyone cloning this project gets the same package versions.

The pyproject.toml now lists the direct dependencies:

dependencies = [
    "matplotlib>=3.10.8",
    "pandas>=3.0.2",
]

The exact version bounds will reflect whichever releases are current when you run the command.

Tip

Commit both pyproject.toml and uv.lock to version control. The lockfile pins exact versions of every transitive dependency, so collaborators get identical environments with uv sync.

Tip

Some scientific packages, such as certain CUDA toolkits and domain-specific Fortran libraries, are easier to install through conda channels than PyPI. If your project depends on packages like these, consider pixi or conda instead.

Add Jupyter as a dev dependency

Jupyter is a tool for interactive exploration, not a runtime dependency of the analysis code. Keep it in a dev group:

uv add --dev jupyter

This adds Jupyter under the [dependency-groups] section in pyproject.toml:

[dependency-groups]
dev = [
    "jupyter>=1.1.1",
]

When deploying the analysis as a script or scheduled job, exclude dev dependencies with uv sync --no-dev to get a leaner environment.

Launch Jupyter Lab from the project directory:

uv run jupyter lab

uv ensures the notebook kernel uses the project’s virtual environment, so every import resolves against your pinned dependencies. See How to Run a Jupyter Notebook with uv for more options.

Set up the project layout

Create directories for data and notebooks:

mkdir -p data notebooks

Create sample data

Add a sample CSV at data/weather.csv:

date,city,temp_high,temp_low,precipitation_mm,humidity_pct
2025-01-01,Portland,8,2,12.5,82
2025-01-02,Portland,7,1,0.0,65
2025-01-03,Portland,9,3,8.3,78
2025-01-04,Portland,6,-1,15.2,88
2025-01-05,Portland,10,4,0.0,60
2025-01-01,Phoenix,18,5,0.0,25
2025-01-02,Phoenix,20,7,0.0,22
2025-01-03,Phoenix,22,8,0.0,20
2025-01-04,Phoenix,19,6,2.1,35
2025-01-05,Phoenix,21,7,0.0,23

Write the analysis script

Replace the contents of main.py with:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from pathlib import Path

matplotlib.use("Agg")


def load_weather_data(path):
    df = pd.read_csv(path, parse_dates=["date"])
    df["temp_range"] = df["temp_high"] - df["temp_low"]
    return df


def summarize_by_city(df):
    return df.groupby("city").agg(
        avg_high=("temp_high", "mean"),
        avg_low=("temp_low", "mean"),
        total_precip=("precipitation_mm", "sum"),
        avg_humidity=("humidity_pct", "mean"),
    ).round(1)


def plot_temperature_comparison(df, output_path):
    fig, ax = plt.subplots(figsize=(8, 4))
    for city, group in df.groupby("city"):
        ax.plot(group["date"], group["temp_high"], marker="o", label=f"{city} high")
        ax.plot(group["date"], group["temp_low"], marker="s", label=f"{city} low",
                linestyle="--", alpha=0.6)
    ax.set_ylabel("Temperature (°C)")
    ax.set_title("Daily Temperatures by City")
    ax.legend()
    fig.tight_layout()
    fig.savefig(output_path, dpi=150)
    print(f"Chart saved to {output_path}")
    plt.close(fig)


def main():
    data_dir = Path(__file__).parent / "data"
    df = load_weather_data(data_dir / "weather.csv")

    summary = summarize_by_city(df)
    print("Weather Summary by City:")
    print(summary)
    print()

    plot_temperature_comparison(df, data_dir / "temperatures.png")


if __name__ == "__main__":
    main()

Run the analysis

uv run main.py

Expected output:

Weather Summary by City:
         avg_high  avg_low  total_precip  avg_humidity
city
Phoenix      20.0      6.6           2.1          25.0
Portland      8.0      1.8          36.0          74.6

Chart saved to data/temperatures.png

Explore in a notebook

With Jupyter running (uv run jupyter lab), create a new notebook in the notebooks/ directory. Every cell can import from the same environment:

import pandas as pd
from pathlib import Path

data_dir = Path("..") / "data"
df = pd.read_csv(data_dir / "weather.csv", parse_dates=["date"])
df.describe()

No kernel configuration is needed. Because you launched Jupyter with uv run, the notebook kernel uses the project’s virtual environment automatically.

Add testing

Data analysis code benefits from tests as much as any other software. Add pytest as a dev dependency:

uv add --dev pytest

Create a test file at test_main.py:

import pandas as pd
from main import load_weather_data, summarize_by_city
from pathlib import Path


def test_load_weather_data():
    path = Path(__file__).parent / "data" / "weather.csv"
    df = load_weather_data(path)
    assert "temp_range" in df.columns
    assert len(df) == 10


def test_summarize_by_city():
    df = pd.DataFrame({
        "city": ["A", "A", "B"],
        "temp_high": [20, 22, 10],
        "temp_low": [10, 12, 5],
        "precipitation_mm": [0.0, 5.0, 10.0],
        "humidity_pct": [50, 60, 70],
    })
    summary = summarize_by_city(df)
    assert summary.loc["A", "avg_high"] == 21.0
    assert summary.loc["B", "total_precip"] == 10.0

Run the tests:

uv run pytest

See the pytest tutorial for more on testing Python projects.

Pin the Python version

Data science teams often need a consistent Python version across all contributors. Pin the version for this project:

uv python pin 3.13

This creates (or updates) the .python-version file in the project root. When anyone runs uv sync or uv run in this directory, uv installs and uses Python 3.13, even if their system has a different version. The pinned version must satisfy the requires-python constraint in pyproject.toml. See How to Change the Python Version of a uv Project for details.

Final project structure

    • .python-version
    • pyproject.toml
    • uv.lock
    • main.py
    • test_main.py
    • README.md
      • weather.csv

Reproduce the environment

Anyone cloning this project can recreate the exact environment with one command:

uv sync

This reads the lockfile and installs the pinned versions of every dependency. No manual version matching, no stale requirements files.

Next steps

Last updated on

Please submit corrections and feedback...