# Set up a data science project with uv

This tutorial sets up a data analysis project with [uv](https://pydevtools.com/handbook/reference/uv.md) so that every dependency is pinned, notebooks run in the right environment, and a collaborator can reproduce your setup with a single command.

## Prerequisites

Install uv following the [installation guide](https://pydevtools.com/handbook/how-to/how-to-install-uv.md). No separate Python install is required.

## Create the project

```console
$ uv init weather_analysis
Initialized project `weather-analysis` at `/path/to/weather_analysis`
$ cd weather_analysis
```

This creates a project directory with a `pyproject.toml`, a `main.py`, and a `README.md`. The [pyproject.toml](https://pydevtools.com/handbook/reference/pyproject.toml.md) stores all project metadata and dependencies:

```toml
[project]
name = "weather-analysis"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = []
```

> [!NOTE]
> The `requires-python` value depends on which Python interpreter uv finds on your system. You may see a different version bound.

## Add data science dependencies

```console
$ uv add pandas matplotlib
Using CPython 3.13.5 interpreter
Creating virtual environment at: .venv
Resolved 14 packages in 160ms
Prepared 12 packages in 1.26s
Installed 12 packages in 139ms
 + contourpy==1.3.3
 + cycler==0.12.1
 + fonttools==4.62.1
 + kiwisolver==1.5.0
 + matplotlib==3.10.9
 + numpy==2.4.4
 + packaging==26.2
 + pandas==3.0.2
 + pillow==12.2.0
 + pyparsing==3.3.2
 + python-dateutil==2.9.0.post0
 + six==1.17.0
```

The exact versions and timings will differ on your machine. uv resolves compatible versions, installs them into an isolated [virtual environment](https://pydevtools.com/handbook/explanation/what-is-a-virtual-environment.md), and writes a `uv.lock` file that pins every transitive dependency. The lockfile guarantees that anyone cloning this project gets the same package versions.

Notice the new `.venv/` directory and the new `uv.lock` file. The venv is where pandas, matplotlib, and their dependencies live; you never `source .venv/bin/activate` because every command in the rest of this tutorial runs through `uv run`, which uses that venv automatically. The lockfile records exact versions of every package uv resolved.

> [!NOTE]
> If `uv add` prints ``error: No `pyproject.toml` found in current directory or any parent directory``, you ran it outside the project. `cd weather_analysis` and try again.

The `pyproject.toml` now lists the direct dependencies:

```toml
dependencies = [
    "matplotlib>=3.10.8",
    "pandas>=3.0.2",
]
```

The exact version bounds will reflect whichever releases are current when you run the command.

> [!TIP]
> Commit both `pyproject.toml` and `uv.lock` to version control. The lockfile pins exact versions of every transitive dependency, so collaborators get identical environments with `uv sync`.

> [!TIP]
> Some scientific packages, such as certain CUDA toolkits and domain-specific Fortran libraries, are easier to install through conda channels than PyPI. If your project depends on packages like these, consider [pixi or conda](https://pydevtools.com/handbook/explanation/uv-vs-pixi-vs-conda-for-scientific-python.md) instead.

## Add Jupyter as a dev dependency

Jupyter is a tool for interactive exploration, not a runtime dependency of the analysis code. Keep it in a dev group:

```console
$ uv add --dev jupyter
Resolved 110 packages in 332ms
Prepared 94 packages in 2.58s
Installed 94 packages in 506ms
 + ...
 + jupyter==1.1.1
 + jupyterlab==4.5.7
 + notebook==7.5.6
 + ...
```

The full output lists every transitive dependency, around 94 packages total. Jupyter pulls in IPython, jupyter-server, jupyterlab, notebook, and a long list of supporting packages.

This adds Jupyter under the `[dependency-groups]` section in `pyproject.toml`:

```toml
[dependency-groups]
dev = [
    "jupyter>=1.1.1",
]
```

Notice the new `[dependency-groups]` table. Unlike `dependencies = [...]` in `[project]`, packages here only install when you ask for them. When deploying the analysis as a script or scheduled job, exclude dev dependencies with `uv sync --no-dev` to get a leaner environment.

Launch Jupyter Lab from the project directory:

```bash
uv run jupyter lab
```

Jupyter Lab starts a local server on `http://localhost:8888/lab` and tries to open a browser tab. uv ensures the notebook kernel uses the project's virtual environment, so every import resolves against your pinned dependencies. Press `CONTROL-C` in the terminal twice to stop the server. See [How to Run a Jupyter Notebook with uv](https://pydevtools.com/handbook/how-to/jupyter-notebook-with-uv.md) for more options.

## Set up the project layout

Create directories for data and notebooks:

```bash
mkdir -p data notebooks
```
```powershell
mkdir data, notebooks
```
## Create sample data

Add a sample CSV at `data/weather.csv`:

```csv
date,city,temp_high,temp_low,precipitation_mm,humidity_pct
2025-01-01,Portland,8,2,12.5,82
2025-01-02,Portland,7,1,0.0,65
2025-01-03,Portland,9,3,8.3,78
2025-01-04,Portland,6,-1,15.2,88
2025-01-05,Portland,10,4,0.0,60
2025-01-01,Phoenix,18,5,0.0,25
2025-01-02,Phoenix,20,7,0.0,22
2025-01-03,Phoenix,22,8,0.0,20
2025-01-04,Phoenix,19,6,2.1,35
2025-01-05,Phoenix,21,7,0.0,23
```

## Write the analysis script

Replace the contents of `main.py` with:

```python
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from pathlib import Path

matplotlib.use("Agg")


def load_weather_data(path):
    df = pd.read_csv(path, parse_dates=["date"])
    df["temp_range"] = df["temp_high"] - df["temp_low"]
    return df


def summarize_by_city(df):
    return df.groupby("city").agg(
        avg_high=("temp_high", "mean"),
        avg_low=("temp_low", "mean"),
        total_precip=("precipitation_mm", "sum"),
        avg_humidity=("humidity_pct", "mean"),
    ).round(1)


def plot_temperature_comparison(df, output_path):
    fig, ax = plt.subplots(figsize=(8, 4))
    for city, group in df.groupby("city"):
        ax.plot(group["date"], group["temp_high"], marker="o", label=f"{city} high")
        ax.plot(group["date"], group["temp_low"], marker="s", label=f"{city} low",
                linestyle="--", alpha=0.6)
    ax.set_ylabel("Temperature (°C)")
    ax.set_title("Daily Temperatures by City")
    ax.legend()
    fig.tight_layout()
    fig.savefig(output_path, dpi=150)
    print(f"Chart saved to {output_path}")
    plt.close(fig)


def main():
    data_dir = Path(__file__).parent / "data"
    df = load_weather_data(data_dir / "weather.csv")

    summary = summarize_by_city(df)
    print("Weather Summary by City:")
    print(summary)
    print()

    plot_temperature_comparison(df, data_dir / "temperatures.png")


if __name__ == "__main__":
    main()
```

## Run the analysis

```bash
uv run main.py
```

Expected output:

```
Weather Summary by City:
         avg_high  avg_low  total_precip  avg_humidity
city
Phoenix      20.0      6.6           2.1          25.0
Portland      8.0      1.8          36.0          74.6

Chart saved to data/temperatures.png
```

## Explore in a notebook

With Jupyter running (`uv run jupyter lab`), create a new notebook in the `notebooks/` directory. Every cell can import from the same environment:

```python
import pandas as pd
from pathlib import Path

data_dir = Path("..") / "data"
df = pd.read_csv(data_dir / "weather.csv", parse_dates=["date"])
df.describe()
```

No kernel configuration is needed. Because you launched Jupyter with `uv run`, the notebook kernel uses the project's virtual environment automatically.

## Add testing

Data analysis code benefits from tests as much as any other software. Add [pytest](https://pydevtools.com/handbook/reference/pytest.md) as a dev dependency:

```console
$ uv add --dev pytest
Resolved 113 packages in 117ms
Prepared 3 packages in 33ms
Installed 3 packages in 16ms
 + iniconfig==2.3.0
 + pluggy==1.6.0
 + pytest==9.0.3
```

Only three new packages install: pytest reuses `packaging` and other shared dependencies already pulled in by Jupyter and matplotlib.

Create a test file at `test_main.py`:

```python
import pandas as pd
from main import load_weather_data, summarize_by_city
from pathlib import Path


def test_load_weather_data():
    path = Path(__file__).parent / "data" / "weather.csv"
    df = load_weather_data(path)
    assert "temp_range" in df.columns
    assert len(df) == 10


def test_summarize_by_city():
    df = pd.DataFrame({
        "city": ["A", "A", "B"],
        "temp_high": [20, 22, 10],
        "temp_low": [10, 12, 5],
        "precipitation_mm": [0.0, 5.0, 10.0],
        "humidity_pct": [50, 60, 70],
    })
    summary = summarize_by_city(df)
    assert summary.loc["A", "avg_high"] == 21.0
    assert summary.loc["B", "total_precip"] == 10.0
```

Run the tests:

```console
$ uv run pytest
======================== test session starts ========================
platform linux -- Python 3.13.5, pytest-9.0.3, pluggy-1.6.0
rootdir: /path/to/weather_analysis
collected 2 items

test_main.py ..                                               [100%]

========================= 2 passed in 0.42s =========================
```

If you see `collected 0 items`, pytest could not find the file: confirm `test_main.py` is in the project root next to `main.py`. See the [pytest tutorial](https://pydevtools.com/handbook/tutorial/setting-up-testing-with-pytest-and-uv.md) for more on testing Python projects.

## Pin the Python version

Data science teams often need a consistent Python version across all contributors. Pin the version for this project:

```console
$ uv python pin 3.13
Pinned `.python-version` to `3.13`
```

If 3.13 was not already installed, you will see a download line first:

```console
Downloading cpython-3.13.5-linux-x86_64-gnu (download) (33.8MiB)
 Downloading cpython-3.13.5-linux-x86_64-gnu (download)
Updated `.python-version` from `3.12` -> `3.13`
```

Notice the new `.python-version` file in the project root. When anyone runs `uv sync` or `uv run` in this directory, uv installs and uses Python 3.13, even if their system has a different version.

> [!NOTE]
> If `uv python pin` prints ``incompatible with the project's `requires-python` ``, the bound in `pyproject.toml` is newer than the version you tried to pin. Edit `pyproject.toml` to lower `requires-python` (for example, to `>=3.13`) first. See [How to Change the Python Version of a uv Project](https://pydevtools.com/handbook/how-to/how-to-change-the-python-version-of-a-uv-project.md) for the full workflow.

After pinning, `uv sync` rebuilds the venv against the new interpreter:

```console
$ uv sync
Using CPython 3.13.5
Removed virtual environment at: .venv
Creating virtual environment at: .venv
Resolved 113 packages in 1ms
Prepared 13 packages in 1.50s
Installed 109 packages in 735ms
```

If your existing venv was on a different Python version, you will see a `Removed virtual environment at: .venv` line first: uv tears down the old venv and rebuilds against 3.13. Either way, your dependencies and `uv.lock` are untouched; only the interpreter changes.

## Final project structure

{{< /filetree/folder >}}
    {{< /filetree/folder >}}
  {{< /filetree/folder >}}
{{< /filetree/container >}}

## Reproduce the environment

Anyone cloning this project can recreate the exact environment with one command:

```console
$ uv sync
Using CPython 3.13.5
Creating virtual environment at: .venv
Resolved 113 packages in 1ms
Installed 109 packages in 735ms
```

This reads the lockfile and installs the pinned versions of every dependency. No manual version matching, no stale requirements files.

## Next steps

- [How to Run a Jupyter Notebook with uv](https://pydevtools.com/handbook/how-to/jupyter-notebook-with-uv.md) for more Jupyter workflows
- [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md) for GPU-accelerated machine learning
- [How to Install RAPIDS with uv](https://pydevtools.com/handbook/how-to/how-to-install-rapids-with-uv.md) for GPU-accelerated data processing
- [Set Up a GPU Data Science Project with pixi](https://pydevtools.com/handbook/tutorial/set-up-a-gpu-data-science-project-with-pixi.md) for projects with non-PyPI dependencies like CUDA toolkits
- [uv vs pixi vs conda for Scientific Python](https://pydevtools.com/handbook/explanation/uv-vs-pixi-vs-conda-for-scientific-python.md) for choosing between uv, pixi, and conda
- [Set up GitHub Actions for a Python project with uv](https://pydevtools.com/handbook/tutorial/setting-up-github-actions-with-uv.md) to run tests in CI
