Skip to content

Try Free-Threaded Python with uv

uv

Python threads usually don’t make CPU-bound code faster. The Global Interpreter Lock lets only one thread execute Python bytecode at a time, so four threads crunching through CPU work run no faster than one. Python 3.14 is the first release where the free-threaded build, proposed in PEP 703: Making the Global Interpreter Lock Optional in CPython, ships as officially supported under PEP 779: Criteria for supported status for free-threaded Python. This tutorial walks you through installing both the standard and free-threaded builds with uv, running the same threaded benchmark on each, and measuring the speedup.

If you want to adopt the free-threaded build in a real project instead of a throwaway benchmark, see How to use free-threaded Python in a uv project.

Prerequisites

  • uv installed.
  • A machine with at least four CPU cores.
  • About ten minutes.

Creating the project

Initialize a new project

uv init gil-demo
cd gil-demo

uv init creates a pyproject.toml, a .python-version file, and a starter main.py.

Install both Python builds

uv python install 3.14 3.14t

The t suffix requests the free-threaded build. Both variants share the same 3.14 language and standard library; only the interpreter implementation differs. uv keeps any number of Python versions side by side, so adding the free-threaded build does not disturb your existing 3.14.

Note

The free-threaded build is officially supported in 3.14, but it still runs about 10-15% slower on single-threaded code than the standard build, and C extension coverage across PyPI is uneven. This tutorial uses only the standard library so you can focus on the GIL difference.

Writing the benchmark

You need a workload that is pure Python and CPU-bound. Counting primes fits because most of the work happens in Python-level loops and arithmetic, not in blocking I/O. The GIL serializes that bytecode across threads in the standard build. The free-threaded build removes that serialization.

Create the benchmark script

Replace the contents of main.py with:

main.py
import sys
import threading
import time


def count_primes(start: int, end: int) -> None:
    count = 0  # kept per-thread; the benchmark only cares about elapsed time
    for n in range(start, end):
        if n < 2:
            continue
        is_prime = True
        for i in range(2, int(n**0.5) + 1):
            if n % i == 0:
                is_prime = False
                break
        if is_prime:
            count += 1


def main(num_threads: int = 4, upper: int = 2_000_000) -> None:
    chunk = upper // num_threads
    threads = []
    start = time.perf_counter()
    for i in range(num_threads):
        lo = i * chunk
        hi = upper if i == num_threads - 1 else (i + 1) * chunk
        t = threading.Thread(target=count_primes, args=(lo, hi))
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    elapsed = time.perf_counter() - start
    print(f"GIL enabled: {sys._is_gil_enabled()}")
    print(f"Threads:     {num_threads}")
    print(f"Elapsed:     {elapsed:.2f}s")


if __name__ == "__main__":
    main()

The script divides the range 0–2,000,000 into four equal chunks, starts one thread per chunk, waits for all of them to finish, and prints the total wall-clock time. sys._is_gil_enabled() was added in 3.13 so the script can report at runtime which build it is running on.

Running on standard Python 3.14

uv run --python 3.14 main.py

Your output will look close to this:

GIL enabled: True
Threads:     4
Elapsed:     3.33s

Four threads, four cores, but only one thread at a time can execute Python bytecode. The other three wait. Total wall-clock time lands close to the single-threaded runtime plus a small overhead for creating threads and switching between them.

Running on free-threaded Python 3.14t

uv run --python 3.14t main.py

The same script, a different result:

GIL enabled: False
Threads:     4
Elapsed:     1.72s

With the GIL out of the way, the four threads can run at the same time on up to four CPU cores. The ~1.9x speedup falls short of the theoretical 4x because threads still add overhead and free-threaded Python does not remove every source of contention inside the interpreter.

Your exact numbers will depend on your CPU, but the pattern holds. With the GIL, adding threads barely helps CPU-bound pure-Python code. Without it, CPU-bound pure-Python workloads like this one can scale much closer to the number of cores.

Review the results

You installed the standard and free-threaded builds of Python 3.14 side by side with uv, ran the same threaded workload on each, and observed the GIL’s effect on parallel performance.

This tutorial kept the setup minimal to isolate one variable. A real project has to handle dependency wheel compatibility (coverage across PyPI is uneven) and thread safety (the GIL provided implicit synchronization that free-threaded code has to replace with explicit locks). How to use free-threaded Python in a uv project walks through both, plus pinning the build in .python-version and the requires-python gotcha.

What to explore next

Last updated on

Please submit corrections and feedback...