How to Install llama-cpp-python
PyPI has no prebuilt wheels for llama-cpp-python. A bare pip install downloads a source distribution and compiles it without GPU support, which means inference runs entirely on CPU. To get CUDA or Metal acceleration, the build needs specific CMake flags passed through environment variables, or the install must pull from a separate wheel index that the maintainer publishes.
Requirements
A C/C++ compiler and CMake are required for building from source. The project’s build system expects clang; on systems where only gcc is available, install clang as well (e.g., apt install clang). On macOS, Xcode command-line tools provide both. On Windows, Visual Studio Build Tools with the “Desktop development with C++” workload are needed.
For CUDA builds, the NVIDIA CUDA Toolkit (version 12.1 through 12.4) must be installed, and nvcc must be on the PATH. For Metal builds on macOS, no extra dependencies are needed beyond Xcode.
Install with CUDA support
Option 1: Prebuilt wheels (recommended)
The maintainer publishes prebuilt CUDA wheels for Linux x86_64 at a custom index, covering Python 3.9 through 3.12 and CUDA 12.1 through 12.4.
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124Replace cu124 with cu121, cu122, or cu123 to match the CUDA version installed on the system. These wheels include the compiled CUDA backend, so no compiler toolchain is needed.
Option 2: Build from source
Set the CMAKE_ARGS environment variable before installing:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-pythonThis compiles the CUDA backend during installation. The CUDA Toolkit must be installed and nvcc available.
Warning
If llama-cpp-python was previously installed without CUDA, pip will reuse the cached wheel. Force a rebuild with pip install llama-cpp-python --force-reinstall --no-cache-dir.
Install with Metal support (macOS)
On Apple Silicon Macs, Metal provides GPU acceleration through the system’s built-in GPU. Prebuilt Metal wheels are available for macOS 11.0+ on arm64:
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metalTo build from source instead:
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-pythonTip
On Apple Silicon, verify that the Python interpreter is arm64-native. Running an x86_64 Python under Rosetta will produce a build that cannot use Metal. Check with python -c "import platform; print(platform.machine())".
Install CPU-only
A plain install from PyPI compiles without GPU support:
pip install llama-cpp-pythonPrebuilt CPU wheels (covering Linux, macOS arm64, and Windows) are also available:
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpuAdd to a uv project
Since the prebuilt wheels live on a custom index, configure uv to use it in pyproject.toml. This example sets up the CUDA 12.4 index on Linux and the Metal index on macOS:
[[tool.uv.index]]
name = "llama-cpp-python-cu124"
url = "https://abetlen.github.io/llama-cpp-python/whl/cu124"
explicit = true
[[tool.uv.index]]
name = "llama-cpp-python-metal"
url = "https://abetlen.github.io/llama-cpp-python/whl/metal"
explicit = true
[tool.uv.sources]
llama-cpp-python = [
{ index = "llama-cpp-python-cu124", marker = "sys_platform == 'linux'" },
{ index = "llama-cpp-python-metal", marker = "sys_platform == 'darwin'" },
]Then add the dependency:
uv add llama-cpp-pythonFor background on why GPU Python packages require this kind of configuration, see Why Installing GPU Python Packages Is So Complicated.
Install with conda-forge or pixi
llama-cpp-python is available on conda-forge for Linux, macOS, and Windows:
conda install -c conda-forge llama-cpp-pythonOr with pixi:
pixi add llama-cpp-pythonThe conda-forge package provides a CPU build. For GPU-accelerated builds, use the pip-based installation methods described above.
Verify the installation
After installing, confirm that the library loads and check which backends are available:
from llama_cpp import llama_supports_gpu_offload
print("GPU offload supported:", llama_supports_gpu_offload())If this prints True, the CUDA or Metal backend compiled correctly. If it prints False after a CUDA or Metal install, the build fell back to CPU.
Troubleshooting
Build fails with “cmake not found”: Install CMake (apt install cmake, brew install cmake, or download from cmake.org) and ensure it is on the PATH.
Build fails with “Could not find compiler set in environment variable CC: clang”: The project expects clang. Install it with apt install clang on Debian/Ubuntu, or set CC and CXX to point to your preferred compiler before running pip install.
CUDA build completes but llama_supports_gpu_offload() returns False: The most likely cause is a cached CPU-only wheel. Reinstall with pip install llama-cpp-python --force-reinstall --no-cache-dir after setting CMAKE_ARGS.
“CUDA_HOME is not set” or “nvcc not found”: Install the CUDA Toolkit and verify that nvcc --version works from the shell. On Linux, the toolkit is often at /usr/local/cuda and needs to be added to PATH.
macOS build produces x86_64 binary on Apple Silicon: This happens when using an x86_64 Python interpreter under Rosetta. Install an arm64-native Python, for example via uv python install.
Windows build fails with compiler errors: Install Visual Studio Build Tools with the “Desktop development with C++” workload. If using MinGW, the project recommends w64devkit and passing explicit compiler paths in CMAKE_ARGS.