Skip to content

Essential pytest Plugins for Reliable, Fast Test Suites

A fresh pytest setup works well for small projects. Tests run, assertions report clearly, and fixtures handle setup and teardown. Then the suite grows past a few hundred tests and new problems appear: tests that pass locally but fail in CI because they depend on execution order, a single network call that hangs the entire run for ten minutes, flaky tests that fail once every twenty runs for reasons no one can reproduce, and performance regressions that slip in because no one benchmarked the hot path.

Four plugins handle these problems. Each targets a specific failure mode, activates automatically after installation, and needs minimal configuration.

Install the starter set

Add all four as dev dependencies with uv:

uv add --dev pytest-randomly pytest-timeout pytest-rerunfailures pytest-benchmark

pytest discovers installed plugins through Python entry points. There is no registration step and no import to add. Run uv run pytest and the plugins are active.

Verify they loaded by checking the session header:

$ uv run pytest -v --co
plugins: benchmark-5.2.3, timeout-2.4.0, randomly-4.1.0, rerunfailures-16.3

Catch hidden order dependencies with pytest-randomly

Tests should pass in any order. In practice, shared module-level state, leftover temp files, database rows from a previous test, or a mutated global variable can create invisible dependencies between tests. Test A passes only because test B ran first and populated a cache. On a developer’s machine, the default alphabetical order hides this. In CI, a new test file changes the order and the suite breaks.

pytest-randomly shuffles test execution order on every run. It also reseeds random.seed, numpy.random.seed, and faker.Faker instances so that randomized test data does not depend on execution position.

Every run prints the seed it used:

$ uv run pytest
Using --randomly-seed=2281747188

When a shuffled run exposes a failure, reproduce it by passing the seed back:

uv run pytest --randomly-seed=2281747188

To temporarily disable reordering without disabling the plugin’s per-test reseeding:

uv run pytest --randomly-dont-reorganize

To disable the plugin entirely (both reordering and reseeding):

uv run pytest -p no:randomly

When to add it. Add pytest-randomly from the start. Order dependencies are cheapest to fix when the suite is small. On a large existing suite, the first shuffled run may expose several hidden dependencies at once. Fix them in a dedicated branch before merging, then leave the plugin enabled permanently.

Kill hanging tests with pytest-timeout

A test that makes a network call, waits on a subprocess, or hits a deadlock can hang indefinitely. On a developer’s machine, you notice and hit Ctrl-C. In CI, the job burns through its timeout budget, delays the pipeline, and produces no useful output.

pytest-timeout enforces a per-test time limit. What happens when a test exceeds it depends on the termination method.

Set a global default in pyproject.toml:

pyproject.toml
[tool.pytest.ini_options]
timeout = 10

Every test now has a ten-second ceiling. Override it on individual tests that legitimately need more time:

import pytest

@pytest.mark.timeout(60)
def test_large_data_import():
    ...

pytest-timeout supports two termination methods. On systems that support SIGALRM (macOS, Linux), the default is signal: it interrupts the test, raises an exception via pytest.fail(), and produces a traceback showing where execution was stuck. Teardown, fixtures, and JUnit XML output work normally. On Windows, the default is thread because SIGALRM is unavailable. You can also set it explicitly:

pyproject.toml
[tool.pytest.ini_options]
timeout = 10
timeout_method = "thread"

Warning

The thread method is less graceful. When its timeout expires, it dumps tracebacks and terminates the entire pytest process. Fixture teardown, subsequent tests, and JUnit XML output may be lost. Use signal whenever your platform supports it (macOS, Linux).

When to add it. Add pytest-timeout as soon as your suite includes any I/O: network calls, file operations, subprocess launches, or database queries. A ten-second default catches most hangs without interfering with legitimate slow tests, which you can allowlist individually.

Retry genuine flakes with pytest-rerunfailures

Some test failures are not bugs in the code under test. A DNS lookup times out. A CI runner’s disk is slow for one second. A timing-sensitive assertion on a concurrent operation races. These tests pass 19 out of 20 runs.

pytest-rerunfailures retries failed tests a configurable number of times before marking them as failures. Each intermediate failed attempt is recorded as a “rerun” in the test report, so you can track which tests are flaky and fix them later. If a subsequent retry passes, the test counts as passed in the final result.

Enable it from the command line:

uv run pytest --reruns 2

Or set it in pyproject.toml:

pyproject.toml
[tool.pytest.ini_options]
addopts = "--reruns 2"

Add a delay between retries when the flake is timing-related:

uv run pytest --reruns 2 --reruns-delay 1

Mark specific tests that are known to be flaky:

import pytest

@pytest.mark.flaky(reruns=3, reruns_delay=2)
def test_external_api_response():
    ...

Warning

Retries mask real bugs if applied too broadly. Use --reruns in CI pipelines where environmental flakes are expected. Do not use it during local development as your default, where a failure should mean “investigate,” not “retry.”

When to add it. Add pytest-rerunfailures when your CI pipeline includes integration tests that touch external services, shared infrastructure, or timing-sensitive concurrency. For pure unit tests with no I/O, you are unlikely to need it.

Track performance regressions with pytest-benchmark

Code reviews catch correctness bugs. They rarely catch performance regressions. A refactor that changes an O(n) loop to O(n^2) passes all tests and ships. The slowdown surfaces weeks later when production data grows.

pytest-benchmark turns any test into a benchmark by injecting a benchmark fixture that measures execution time with statistical rigor: multiple rounds and outlier detection. Warmup iterations are configurable and auto-enabled only on PyPy by default.

def test_sort_performance(benchmark):
    data = list(range(10_000, 0, -1))
    result = benchmark(sorted, data)
    assert result == sorted(data)

Run only the benchmark tests (skipping non-benchmark tests):

uv run pytest --benchmark-only

The benchmark table shows timing statistics for each benchmarked function:

$ uv run pytest --benchmark-only
------------ benchmark: 1 tests ------------
Name (time in us)           Min     Max    Mean   Rounds
test_sort_performance     31.04   64.04   36.39    22120

Save results for comparison:

uv run pytest --benchmark-only --benchmark-save=baseline

Compare against a saved baseline after a code change. The --benchmark-save command stores results under a numeric run ID (0001, 0002, etc.). Pass that ID to --benchmark-compare:

uv run pytest --benchmark-only --benchmark-compare=0001

Skip benchmarks during regular test runs so they do not slow down the feedback loop:

pyproject.toml
[tool.pytest.ini_options]
addopts = "--benchmark-skip"

Run them explicitly when you want results:

uv run pytest --benchmark-only

--benchmark-only overrides --benchmark-skip, so the two coexist in the same project without conflict.

When to add it. Add pytest-benchmark when your project has performance-sensitive code paths: parsers, serializers, data pipelines, or anything processing user-provided input at scale. Start by benchmarking the two or three hottest functions rather than trying to benchmark everything.

Configure all four plugins together

A single pyproject.toml section configures pytest and all installed plugins:

pyproject.toml
[tool.pytest.ini_options]
timeout = 10
addopts = "--reruns 2 --benchmark-skip"

pytest-randomly shuffles test order automatically because it is installed. The timeout setting kills any test that exceeds ten seconds. The addopts line retries failures twice for environmental flakes and skips benchmarks by default. Run uv run pytest --benchmark-only when you want benchmark results.

Add more plugins as your project grows

These four plugins are the foundation. As your project grows, others earn their place:

  • pytest-xdist distributes tests across CPU cores. For suites that take more than thirty seconds, parallel execution is the single biggest speedup.
  • pytest-cov integrates coverage measurement directly into pytest runs, reporting which lines your tests exercise.
  • pytest-mock provides a mocker fixture that wraps unittest.mock with a cleaner API and automatic cleanup.
  • pytest-asyncio lets you write tests for async code with async def test_ functions and await calls.

Each of these solves a narrower problem. The four in this guide apply to nearly every Python project regardless of its domain.

Learn More

Last updated on