# Create your first Python project with uv

This Python uv tutorial walks you through building a text analysis tool that counts words, measures sentence length, and reports word frequency. No prior Python experience required; [uv](https://pydevtools.com/handbook/reference/uv.md) handles the Python install, project scaffolding, and dependency management for you.

## Prerequisites

Before we begin, make sure you have uv installed on your system. You can install it following [the directions from the uv documentation](https://docs.astral.sh/uv/getting-started/installation/).

[Git](https://git-scm.com/downloads) is optional. uv does not install Git for you, but if Git is already on your `PATH`, `uv init` initializes a Git repository in the new project. The tutorial works either way.

> [!TIP]
> You do not have to have Python installed on your computer to run this tutorial.

## Creating a New Project

Let's create a project called "text_analyzer" that will analyze text statistics like word frequency, sentence length, and readability scores:

```console
$ uv init text_analyzer
Initialized project `text-analyzer` at `/path/to/text_analyzer`
$ cd text_analyzer
```

uv prints a single confirmation line. If you see `error: project name '...' is not valid`, the directory you tried to create already exists; pick a fresh name or remove the existing directory first.

Notice the new files uv created in the project: `pyproject.toml`, `main.py`, `README.md`, a `.python-version` file pinning the interpreter, and a `.gitignore`. If Git is installed on your system, the directory is also a Git repository ready for its first commit. Without Git, uv skips that step but still creates the rest of the project.

Look at the generated [pyproject.toml](https://pydevtools.com/handbook/reference/pyproject.toml.md), which stores the project configuration:

```toml
[project]
name = "text-analyzer"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = []
```

## Adding Dependencies

If you see `error: No `pyproject.toml` found in current directory or any parent directory`, you ran the next commands outside the project. `cd` into `text_analyzer` first.

Our text analyzer will need some packages for data processing and analysis. Add pandas first:

```console
$ uv add pandas
Using CPython 3.14.4
Creating virtual environment at: .venv
Resolved 6 packages in 221ms
Downloading numpy (5.0MiB)
Downloading pandas (9.5MiB)
 Downloaded numpy
 Downloaded pandas
Prepared 2 packages in 497ms
Installed 4 packages in 25ms
 + numpy==2.4.4
 + pandas==3.0.2
 + python-dateutil==2.9.0.post0
 + six==1.17.0
```

The exact Python version, package versions, and timings will differ on your machine. Notice the new `.venv/` directory and `uv.lock` file in the project. The [virtual environment](https://pydevtools.com/handbook/explanation/what-is-a-virtual-environment.md) holds the project's Python interpreter and installed packages; the [lockfile](https://pydevtools.com/handbook/explanation/what-is-a-lock-file.md) pins exact versions so anyone else can reproduce the environment with one command.

Add nltk for natural language processing:

```console
$ uv add nltk
Resolved 12 packages in 249ms
Downloading nltk (1.5MiB)
 Downloaded nltk
Prepared 4 packages in 194ms
Installed 5 packages in 10ms
 + click==8.3.3
 + joblib==1.5.3
 + nltk==3.9.4
 + regex==2026.4.4
 + tqdm==4.67.3
```

Each `uv add` updates `pyproject.toml`, refreshes `uv.lock`, and installs the package into `.venv/`. The [pyproject.toml](https://pydevtools.com/handbook/reference/pyproject.toml.md) now includes these dependencies:

```toml
[project]
name = "text-analyzer"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
    "nltk>=3.9.1",
    "pandas>=2.2.3",
]
```

## Creating the Project Structure

Let's create a directory for our sample data:

```bash
mkdir data
```

Let's create a sample text file to analyze. Create `data/sample.txt` with this content:

```text
The quick brown fox jumps over the lazy dog. This pangram contains every letter of the English alphabet at least once. Pangrams are useful for testing fonts, keyboards, and printers. The five boxing wizards jump quickly! How vexingly quick daft zebras jump.
```

Now let's replace the contents of `main.py` with our analysis code:

```python
import pandas as pd
import nltk
from collections import Counter
from pathlib import Path

nltk.download('punkt_tab')

class TextAnalyzer:
    """A class for analyzing text statistics."""

    def __init__(self):
        # Download required NLTK data (only needed once)
        nltk.download('punkt', quiet=True)

    def read_text(self, file_path):
        """Read text from a file."""
        return Path(file_path).read_text()

    def analyze_text(self, text):
        """Analyze text and return statistics."""
        # Tokenize text into sentences and words
        sentences = nltk.sent_tokenize(text)
        words = nltk.word_tokenize(text.lower())

        # Calculate basic statistics
        word_count = len(words)
        sentence_count = len(sentences)
        avg_sentence_length = word_count / sentence_count

        # Calculate word frequencies
        word_freq = Counter(words)
        most_common = word_freq.most_common(5)

        # Create statistics dictionary
        stats = {
            "Total Words": word_count,
            "Total Sentences": sentence_count,
            "Average Sentence Length": round(avg_sentence_length, 2),
            "Unique Words": len(word_freq),
        }

        # Create word frequency DataFrame
        freq_df = pd.DataFrame(most_common, columns=['Word', 'Frequency'])

        return stats, freq_df

def main():
    # Initialize analyzer
    analyzer = TextAnalyzer()

    # Read and analyze sample text
    file_path = Path(__file__).parent / "data" / "sample.txt"
    text = analyzer.read_text(file_path)

    # Get analysis results
    stats, word_freq = analyzer.analyze_text(text)

    # Print results
    print("\nText Statistics:")
    for metric, value in stats.items():
        print(f"{metric}: {value}")

    print("\nMost Common Words:")
    print(word_freq.to_string(index=False))

if __name__ == "__main__":
    main()
```

The `TextAnalyzer` class reads text from a file, tokenizes it into sentences and words using NLTK, then computes statistics like word count and average sentence length. It also uses `Counter` to find the most common words and returns the results as both a dictionary and a pandas DataFrame.

## Running the Project

To run your script:

```bash
uv run main.py
```

`uv run` resolves any pending changes in `pyproject.toml`, makes sure `.venv/` is up to date, and then runs the command against the project's interpreter. If you bypass `uv run` and call `python main.py` directly, you'll likely see `ModuleNotFoundError: No module named 'pandas'` because your system Python isn't using the project's venv.

The first run also downloads the `punkt_tab` tokenizer NLTK needs. Expect output like this:

```
[nltk_data] Downloading package punkt_tab to /path/to/home/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.

Text Statistics:
Total Words: 49
Total Sentences: 5
Average Sentence Length: 9.8
Unique Words: 40

Most Common Words:
 Word  Frequency
  the          4
    .          4
quick          2
    ,          2
 jump          2
```

The `[nltk_data]` lines disappear on subsequent runs because the tokenizer is cached in an `nltk_data/` directory under your user home (`~/nltk_data/` on macOS and Linux, `%USERPROFILE%\nltk_data\` on Windows).

## Adding Development Dependencies

Let's add some development tools for testing and code quality. Add pytest first:

```console
$ uv add --dev pytest
Resolved 17 packages in 102ms
Installed 5 packages in 12ms
 + iniconfig==2.3.0
 + packaging==26.2
 + pluggy==1.6.0
 + pygments==2.20.0
 + pytest==9.0.3
```

Then add [Ruff](https://pydevtools.com/handbook/reference/ruff.md):

```console
$ uv add --dev ruff
Resolved 18 packages in 178ms
Installed 1 package in 3ms
 + ruff==0.15.12
```

Notice that `--dev` lands these in a separate `[dependency-groups]` table instead of the main `dependencies` list. They get installed in `.venv/` like any other package, but `uv sync --no-dev` will skip them, which matters when you build a slim Docker image or deploy to production.

Both tools now sit in the `dev` [dependency group](https://pydevtools.com/handbook/explanation/understanding-dependency-groups-in-uv.md) in [pyproject.toml](https://pydevtools.com/handbook/reference/pyproject.toml.md):

```toml
[dependency-groups]
dev = [
    "pytest>=8.3.4",
    "ruff>=0.8.4",
]
```

## Using Development Tools

Use the dev tools through `uv run` so they pick up the project's venv automatically. If you call `ruff` directly without `uv run`, your shell either reports `command not found: ruff` or runs a different Ruff installed elsewhere on your machine.

Format the code with Ruff:

```console
$ uv run ruff format .
1 file reformatted
```

Then run the linter with automatic fixes:

```console
$ uv run ruff check --fix .
Found 1 error (1 fixed, 0 remaining).
```

The remaining error count drops to zero and Ruff exits cleanly. Open `main.py` and notice that the import block has been reordered (standard-library imports first, third-party imports next, each group sorted alphabetically). That's Ruff's `I001` rule auto-fixing the import order.
