How do I install llama-cpp-python with CUDA support?

Use the maintainer's prebuilt CUDA wheels: `pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124`. Replace `cu124` with `cu121`, `cu122`, or `cu123` to match the installed CUDA version. These wheels bundle the compiled CUDA backend, so no toolchain is needed. To compile from source instead, run `CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python`.

Why is my llama-cpp-python install CPU-only when I want GPU support?

A bare `pip install llama-cpp-python` compiles from PyPI's source distribution without GPU support. The build needs CMake flags passed via `CMAKE_ARGS` to enable CUDA (`-DGGML_CUDA=on`) or Metal (`-DGGML_METAL=on`), or you can install from the maintainer's prebuilt wheel index. A previously cached CPU wheel may also be reused; force a rebuild with `--force-reinstall --no-cache-dir`.

How do I install llama-cpp-python with Metal acceleration on Apple Silicon?

Install from the Metal wheel index: `pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal`. Prebuilt Metal wheels target macOS 11.0+ on arm64. Verify the Python interpreter is arm64-native with `python -c "import platform; print(platform.machine())"`; an x86_64 Python under Rosetta cannot use Metal.

How do I add llama-cpp-python to a uv project with GPU support?

Add `[[tool.uv.index]]` entries with `explicit = true` for the CUDA and Metal wheel URLs, then route the package with `[tool.uv.sources]` using `sys_platform` markers, for example `{ index = "llama-cpp-python-cu124", marker = "sys_platform == 'linux'" }` for CUDA and a Metal entry for Darwin. Then run `uv add llama-cpp-python`.

How to Install llama-cpp-python

by Tim Hopper · Markdown

Scientific Python

PyPI has no prebuilt wheels for llama-cpp-python. A bare pip install downloads a source distribution and compiles it without GPU support, which means inference runs entirely on CPU. To get CUDA or Metal acceleration, the build needs specific CMake flags passed through environment variables, or the install must pull from a separate wheel index that the maintainer publishes.

Requirements

A C/C++ compiler and CMake are required for building from source. The project’s build system expects clang; on systems where only gcc is available, install clang as well (e.g., apt install clang). On macOS, Xcode command-line tools provide both. On Windows, Visual Studio Build Tools with the “Desktop development with C++” workload are needed.

For CUDA builds, the NVIDIA CUDA Toolkit (version 12.1 through 12.4) must be installed, and nvcc must be on the PATH. For Metal builds on macOS, no extra dependencies are needed beyond Xcode.

Install with CUDA support

Option 1: Prebuilt wheels (recommended)

The maintainer publishes prebuilt CUDA wheels for Linux x86_64 at a custom index, covering Python 3.9 through 3.12 and CUDA 12.1 through 12.4.

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124

Replace cu124 with cu121, cu122, or cu123 to match the CUDA version installed on the system. These wheels include the compiled CUDA backend, so no compiler toolchain is needed.

Option 2: Build from source

Set the CMAKE_ARGS environment variable before installing:

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

This compiles the CUDA backend during installation. The CUDA Toolkit must be installed and nvcc available.

Warning

If llama-cpp-python was previously installed without CUDA, pip will reuse the cached wheel. Force a rebuild with pip install llama-cpp-python --force-reinstall --no-cache-dir.

Install with Metal support (macOS)

On Apple Silicon Macs, Metal provides GPU acceleration through the system’s built-in GPU. Prebuilt Metal wheels are available for macOS 11.0+ on arm64:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal

To build from source instead:

CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python

Tip

On Apple Silicon, verify that the Python interpreter is arm64-native. Running an x86_64 Python under Rosetta will produce a build that cannot use Metal. Check with python -c "import platform; print(platform.machine())".

Install CPU-only

A plain install from PyPI compiles without GPU support:

pip install llama-cpp-python

Prebuilt CPU wheels (covering Linux, macOS arm64, and Windows) are also available:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Add to a uv project

Since the prebuilt wheels live on a custom index, configure uv to use it in pyproject.toml. This example sets up the CUDA 12.4 index on Linux and the Metal index on macOS:

[[tool.uv.index]]
name = "llama-cpp-python-cu124"
url = "https://abetlen.github.io/llama-cpp-python/whl/cu124"
explicit = true

[[tool.uv.index]]
name = "llama-cpp-python-metal"
url = "https://abetlen.github.io/llama-cpp-python/whl/metal"
explicit = true

[tool.uv.sources]
llama-cpp-python = [
  { index = "llama-cpp-python-cu124", marker = "sys_platform == 'linux'" },
  { index = "llama-cpp-python-metal", marker = "sys_platform == 'darwin'" },
]

Then add the dependency:

uv add llama-cpp-python

For background on why GPU Python packages require this kind of configuration, see Why Installing GPU Python Packages Is So Complicated.

Install with conda-forge or pixi

llama-cpp-python is available on conda-forge for Linux, macOS, and Windows:

conda install -c conda-forge llama-cpp-python

Or with pixi:

pixi add llama-cpp-python

The conda-forge package provides a CPU build. For GPU-accelerated builds, use the pip-based installation methods described above.

Verify the installation

After installing, confirm that the library loads and check which backends are available:

from llama_cpp import llama_supports_gpu_offload
print("GPU offload supported:", llama_supports_gpu_offload())

If this prints True, the CUDA or Metal backend compiled correctly. If it prints False after a CUDA or Metal install, the build fell back to CPU.

Troubleshooting

Build fails with “cmake not found”: Install CMake (apt install cmake, brew install cmake, or download from cmake.org) and ensure it is on the PATH.

Build fails with “Could not find compiler set in environment variable CC: clang”: The project expects clang. Install it with apt install clang on Debian/Ubuntu, or set CC and CXX to point to your preferred compiler before running pip install.

CUDA build completes but llama_supports_gpu_offload() returns False: The most likely cause is a cached CPU-only wheel. Reinstall with pip install llama-cpp-python --force-reinstall --no-cache-dir after setting CMAKE_ARGS.

“CUDA_HOME is not set” or “nvcc not found”: Install the CUDA Toolkit and verify that nvcc --version works from the shell. On Linux, the toolkit is often at /usr/local/cuda and needs to be added to PATH.

macOS build produces x86_64 binary on Apple Silicon: This happens when using an x86_64 Python interpreter under Rosetta. Install an arm64-native Python, for example via uv python install.

Windows build fails with compiler errors: Install Visual Studio Build Tools with the “Desktop development with C++” workload. If using MinGW, the project recommends w64devkit and passing explicit compiler paths in CMAKE_ARGS.

Last updated on May 13, 2026

How to Install JAX with uv How to install OpenAI's Astral plugins for Claude Code

Please submit corrections and feedback...