Try Free-Threaded Python with uv
Python threads usually don’t make CPU-bound code faster. The Global Interpreter Lock lets only one thread execute Python bytecode at a time, so four threads crunching through CPU work run no faster than one. Python 3.14 is the first release where the free-threaded build, proposed in PEP 703: Making the Global Interpreter Lock Optional in CPython, ships as officially supported under PEP 779: Criteria for supported status for free-threaded Python. This tutorial walks you through installing both the standard and free-threaded builds with uv, running the same threaded benchmark on each, and measuring the speedup.
If you want to adopt the free-threaded build in a real project instead of a throwaway benchmark, see How to use free-threaded Python in a uv project.
Prerequisites
- uv installed.
- A machine with at least four CPU cores.
- About ten minutes.
Creating the project
Initialize a new project
uv init gil-demo
cd gil-demouv init creates a pyproject.toml, a .python-version file, and a starter main.py.
Install both Python builds
uv python install 3.14 3.14tThe t suffix requests the free-threaded build. Both variants share the same 3.14 language and standard library; only the interpreter implementation differs. uv keeps any number of Python versions side by side, so adding the free-threaded build does not disturb your existing 3.14.
Note
The free-threaded build is officially supported in 3.14, but it still runs about 10-15% slower on single-threaded code than the standard build, and C extension coverage across PyPI is uneven. This tutorial uses only the standard library so you can focus on the GIL difference.
Writing the benchmark
You need a workload that is pure Python and CPU-bound. Counting primes fits because most of the work happens in Python-level loops and arithmetic, not in blocking I/O. The GIL serializes that bytecode across threads in the standard build. The free-threaded build removes that serialization.
Create the benchmark script
Replace the contents of main.py with:
import sys
import threading
import time
def count_primes(start: int, end: int) -> None:
count = 0 # kept per-thread; the benchmark only cares about elapsed time
for n in range(start, end):
if n < 2:
continue
is_prime = True
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
is_prime = False
break
if is_prime:
count += 1
def main(num_threads: int = 4, upper: int = 2_000_000) -> None:
chunk = upper // num_threads
threads = []
start = time.perf_counter()
for i in range(num_threads):
lo = i * chunk
hi = upper if i == num_threads - 1 else (i + 1) * chunk
t = threading.Thread(target=count_primes, args=(lo, hi))
threads.append(t)
t.start()
for t in threads:
t.join()
elapsed = time.perf_counter() - start
print(f"GIL enabled: {sys._is_gil_enabled()}")
print(f"Threads: {num_threads}")
print(f"Elapsed: {elapsed:.2f}s")
if __name__ == "__main__":
main()The script divides the range 0–2,000,000 into four equal chunks, starts one thread per chunk, waits for all of them to finish, and prints the total wall-clock time. sys._is_gil_enabled() was added in 3.13 so the script can report at runtime which build it is running on.
Running on standard Python 3.14
uv run --python 3.14 main.pyYour output will look close to this:
GIL enabled: True
Threads: 4
Elapsed: 3.33s
Four threads, four cores, but only one thread at a time can execute Python bytecode. The other three wait. Total wall-clock time lands close to the single-threaded runtime plus a small overhead for creating threads and switching between them.
Running on free-threaded Python 3.14t
uv run --python 3.14t main.pyThe same script, a different result:
GIL enabled: False
Threads: 4
Elapsed: 1.72s
With the GIL out of the way, the four threads can run at the same time on up to four CPU cores. The ~1.9x speedup falls short of the theoretical 4x because threads still add overhead and free-threaded Python does not remove every source of contention inside the interpreter.
Your exact numbers will depend on your CPU, but the pattern holds. With the GIL, adding threads barely helps CPU-bound pure-Python code. Without it, CPU-bound pure-Python workloads like this one can scale much closer to the number of cores.
Review the results
You installed the standard and free-threaded builds of Python 3.14 side by side with uv, ran the same threaded workload on each, and observed the GIL’s effect on parallel performance.
This tutorial kept the setup minimal to isolate one variable. A real project has to handle dependency wheel compatibility (coverage across PyPI is uneven) and thread safety (the GIL provided implicit synchronization that free-threaded code has to replace with explicit locks). How to use free-threaded Python in a uv project walks through both, plus pinning the build in .python-version and the requires-python gotcha.
What to explore next
- How to use free-threaded Python in a uv project adopts the free-threaded build in a real uv project
- The official Python free-threading HOWTO covers identifying builds, known limitations, and porting guidance
- PEP 703: Making the Global Interpreter Lock Optional in CPython for the full proposal and design rationale
- Getting started with uv covers uv’s core features if this was your first time using it
- uv reference page for a concise overview of uv’s features