Skip to content

py-spy: Sampling Profiler for Python

py-spy is a sampling profiler for Python programs. It visualizes where a Python process spends its time without restarting the process or modifying its code. py-spy runs in a separate process from the program it profiles and reads stack information out of the target’s memory, which keeps overhead low enough that the project documents it as safe to use against production workloads.

When to Use py-spy

Use py-spy when a Python process is already running and needs to be profiled in place: a web worker under load, a data-pipeline job mid-run, a stuck process that needs a stack dump, or any CPU-bound script whose profile should not be distorted by per-function tracing overhead. py-spy is a practical default for production CPU profiling because it requires no instrumentation and can be attached and detached at will. For line-by-line attribution inside a known hot function, reach for line_profiler; for memory rather than CPU, reach for memray.

Key Features

  • Out-of-process sampling: runs as a separate process and reads the target interpreter’s stacks via OS-level process inspection. The profiled program needs no import, no decorator, and no restart.
  • Three subcommands: record writes a profile to disk, top shows a live top-style view, and dump prints the current Python call stack for every thread (useful for diagnosing a hung process).
  • Multiple output formats: record emits flame-graph SVG by default, speedscope JSON via --format speedscope, or a raw format that can be post-processed. Speedscope output also opens in the Perfetto UI as a Chrome trace.
  • Native extension profiling: the --native flag captures C, C++, and Cython frames alongside Python frames. Native support covers Linux (x86_64, ARM, 32-bit ARM) and Windows (x86_64); the project does not claim native support on macOS.
  • Subprocess support: --subprocesses follows multiprocessing, subprocess.Popen, and forked workers so a process tree can be profiled as one.
  • GIL and idle filtering: --gil restricts samples to frames that hold the GIL; --idle includes threads that are waiting on I/O. Samples can be filtered to the kind of work being investigated.
  • Broad interpreter coverage: supports CPython 2.3 through 2.7 and 3.3 through 3.13 (per the project README as of v0.4.1).

Installation

The simplest install uses uv to place py-spy on PATH as a standalone tool:

uv tool install py-spy

Other options documented by the project:

pip install py-spy              # from PyPI
brew install py-spy             # macOS, via Homebrew
cargo install py-spy            # builds from source; needs libunwind-dev on Linux

The project also ships in the Arch User Repository (yay -S py-spy) and the Alpine edge testing repo.

Platform and Permission Notes

py-spy runs on Linux, macOS, Windows, and FreeBSD. Attaching to a running process requires OS-level permissions to read another process’s memory, which has several practical consequences:

  • macOS: the README states that attaching on macOS always requires running as root. The system Python at /usr/bin/python cannot be profiled because of System Integrity Protection; a non-system Python (installed via uv, Homebrew, or pyenv) is required.
  • Linux: profiling a process that py-spy launched itself (py-spy record -- python script.py) works without root. Attaching to an existing process by PID usually requires sudo unless the kernel.yama.ptrace_scope sysctl has been relaxed.
  • Docker: the default seccomp profile blocks process_vm_readv. Run the container with --cap-add SYS_PTRACE to allow py-spy to attach.
  • Kubernetes: add SYS_PTRACE to the pod’s securityContext.capabilities.add list.

Output Formats

Format Flag Viewer
Flame graph default, or --format flamegraph Any browser (SVG)
Speedscope --format speedscope speedscope.app or Perfetto UI
Raw --format raw Custom tooling

Pros

  • Requires no code changes and no process restart, which makes it practical for production debugging.
  • Runs out of process, so the sampler’s own overhead is decoupled from the profiled program’s event loop or GIL.
  • Captures native extension frames on Linux and Windows when built with symbols.
  • py-spy dump prints live stacks, which recovers useful state from a hung or looping process where a full profile is overkill.

Cons

  • Sampling profilers cannot attribute exact per-call costs the way a deterministic profiler like cProfile can. Very short functions may be under-sampled.
  • Attaching to an existing PID needs elevated privileges on every supported OS; sandboxed environments (Docker, Kubernetes, restricted containers) need explicit capability grants.
  • macOS does not support native-extension profiling per the project’s platform matrix.
  • The project is actively maintained but has long gaps between releases; v0.4.1 (July 2025) followed v0.4.0 (November 2024), which itself followed v0.3.14 (September 2022).

Learn More

Last updated on

Please submit corrections and feedback...