Create your first Python project

Create your first Python project

This tutorial helps you create a Python project with zero Python experience. You don’t even need Python installed on your computer. We’ll build a text analysis tool that processes sample text data.

Prerequisites

Before we begin, make sure you have uv installed on your system. You can install it following the directions from the uv documentation.

Tip

You do not have to have Python installed on your computer to run this tutorial.

Creating a New Project

Let’s create a project called “text_analyzer” that will analyze text statistics like word frequency, sentence length, and readability scores:

uv init text_analyzer
cd text_analyzer

This command creates a new directory with some initial files. Let’s look at the generated pyproject.toml, which will store our project configuration:

[project]
name = "text-analyzer"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = []

Adding Dependencies

Our text analyzer will need some packages for data processing and analysis. Let’s add them using uv:

# Add pandas for data analysis and statistics
uv add pandas

# Add nltk for natural language processing
uv add nltk

Each time we run uv add, it updates our project configuration, creates or updates the lockfile, and installs the package in our project’s virtual environment. Our pyproject.toml now includes these dependencies:

[project]
name = "text-analyzer"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
    "nltk>=3.9.1",
    "pandas>=2.2.3",
]

Creating the Project Structure

Let’s create a proper project structure with our source code and sample data:

# Remove the default hello.py
rm hello.py

# Create our project structure
mkdir src
mkdir src/text_analyzer
mkdir src/text_analyzer/data
touch src/text_analyzer/__init__.py
touch src/text_analyzer/main.py

Let’s create a sample text file to analyze. Create src/text_analyzer/data/sample.txt with this content:

The quick brown fox jumps over the lazy dog. This pangram contains every letter of the English alphabet at least once. Pangrams are useful for testing fonts, keyboards, and printers. The five boxing wizards jump quickly! How vexingly quick daft zebras jump.

Now let’s create our analysis code in src/text_analyzer/main.py:

import pandas as pd
import nltk
from collections import Counter
from pathlib import Path

nltk.download('punkt_tab')

class TextAnalyzer:
    """A class for analyzing text statistics."""

    def __init__(self):
        # Download required NLTK data (only needed once)
        nltk.download('punkt', quiet=True)

    def read_text(self, file_path):
        """Read text from a file."""
        return Path(file_path).read_text()

    def analyze_text(self, text):
        """Analyze text and return statistics."""
        # Tokenize text into sentences and words
        sentences = nltk.sent_tokenize(text)
        words = nltk.word_tokenize(text.lower())

        # Calculate basic statistics
        word_count = len(words)
        sentence_count = len(sentences)
        avg_sentence_length = word_count / sentence_count

        # Calculate word frequencies
        word_freq = Counter(words)
        most_common = word_freq.most_common(5)

        # Create statistics dictionary
        stats = {
            "Total Words": word_count,
            "Total Sentences": sentence_count,
            "Average Sentence Length": round(avg_sentence_length, 2),
            "Unique Words": len(word_freq),
        }

        # Create word frequency DataFrame
        freq_df = pd.DataFrame(most_common, columns=['Word', 'Frequency'])

        return stats, freq_df

def main():
    # Initialize analyzer
    analyzer = TextAnalyzer()

    # Read and analyze sample text
    file_path = Path(__file__).parent / "data" / "sample.txt"
    text = analyzer.read_text(file_path)

    # Get analysis results
    stats, word_freq = analyzer.analyze_text(text)

    # Print results
    print("\nText Statistics:")
    for metric, value in stats.items():
        print(f"{metric}: {value}")

    print("\nMost Common Words:")
    print(word_freq.to_string(index=False))

if __name__ == "__main__":
    main()

This code creates a TextAnalyzer class that:

  1. Reads text from a file
  2. Calculates basic statistics like word count and sentence length
  3. Finds the most common words
  4. Returns the results in both dictionary and DataFrame formats

Running the Project

To run your script:

uv run src/text_analyzer/main.py

You should see output showing statistics about our sample text and the most common words used, e.g.

Text Statistics:
Total Words: 49
Total Sentences: 5
Average Sentence Length: 9.8
Unique Words: 40

Most Common Words:
 Word  Frequency
  the          4
    .          4
quick          2
    ,          2
 jump          2

Adding Development Dependencies

Let’s add some development tools for testing and code quality:

# Add pytest for testing
uv add --dev pytest

# Add ruff for linting and formatting
uv add --dev ruff

These will be added to a development dependencies group in pyproject.toml:

[tool.uv]
dev-dependencies = [
    "pytest>=8.3.4",
    "ruff>=0.8.4",
]

Using Development Tools

Now we can use our development tools through uv:

# Format code with ruff
uv run ruff format .

# Run linting with automated fixes
uv run ruff check --fix .
Last updated on

Please submit corrections and feedback...