# How to Install llama-cpp-python


PyPI has no prebuilt [wheels](https://pydevtools.com/handbook/reference/wheel.md) for `llama-cpp-python`. A bare `pip install` downloads a source distribution and compiles it without GPU support, which means inference runs entirely on CPU. To get CUDA or Metal acceleration, the build needs specific CMake flags passed through environment variables, or the install must pull from a separate wheel index that the maintainer publishes.

## Requirements

A C/C++ compiler and CMake are required for building from source. The project's build system expects `clang`; on systems where only `gcc` is available, install `clang` as well (e.g., `apt install clang`). On macOS, Xcode command-line tools provide both. On Windows, Visual Studio Build Tools with the "Desktop development with C++" workload are needed.

For CUDA builds, the NVIDIA CUDA Toolkit (version 12.1 through 12.4) must be installed, and `nvcc` must be on the PATH. For Metal builds on macOS, no extra dependencies are needed beyond Xcode.

## Install with CUDA support

### Option 1: Prebuilt wheels (recommended)

The maintainer publishes prebuilt CUDA wheels for Linux x86_64 at a custom index, covering Python 3.9 through 3.12 and CUDA 12.1 through 12.4.

```sh
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
```

Replace `cu124` with `cu121`, `cu122`, or `cu123` to match the CUDA version installed on the system. These wheels include the compiled CUDA backend, so no compiler toolchain is needed.

### Option 2: Build from source

Set the `CMAKE_ARGS` environment variable before installing:

```sh
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
```
```powershell
$env:CMAKE_ARGS="-DGGML_CUDA=on"
pip install llama-cpp-python
```
This compiles the CUDA backend during installation. The CUDA Toolkit must be installed and `nvcc` available.

> [!WARNING]
> If `llama-cpp-python` was previously installed without CUDA, pip will reuse the cached wheel. Force a rebuild with `pip install llama-cpp-python --force-reinstall --no-cache-dir`.

## Install with Metal support (macOS)

On Apple Silicon Macs, Metal provides GPU acceleration through the system's built-in GPU. Prebuilt Metal wheels are available for macOS 11.0+ on arm64:

```sh
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
```

To build from source instead:

```sh
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
```

> [!TIP]
> On Apple Silicon, verify that the Python interpreter is arm64-native. Running an x86_64 Python under Rosetta will produce a build that cannot use Metal. Check with `python -c "import platform; print(platform.machine())"`.

## Install CPU-only

A plain install from PyPI compiles without GPU support:

```sh
pip install llama-cpp-python
```

Prebuilt CPU wheels (covering Linux, macOS arm64, and Windows) are also available:

```sh
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```

## Add to a uv project

Since the prebuilt wheels live on a custom index, configure [uv](https://pydevtools.com/handbook/reference/uv.md) to use it in `pyproject.toml`. This example sets up the CUDA 12.4 index on Linux and the Metal index on macOS:

```toml
[[tool.uv.index]]
name = "llama-cpp-python-cu124"
url = "https://abetlen.github.io/llama-cpp-python/whl/cu124"
explicit = true

[[tool.uv.index]]
name = "llama-cpp-python-metal"
url = "https://abetlen.github.io/llama-cpp-python/whl/metal"
explicit = true

[tool.uv.sources]
llama-cpp-python = [
  { index = "llama-cpp-python-cu124", marker = "sys_platform == 'linux'" },
  { index = "llama-cpp-python-metal", marker = "sys_platform == 'darwin'" },
]
```

Then add the dependency:

```sh
uv add llama-cpp-python
```

For background on why GPU Python packages require this kind of configuration, see [Why Installing GPU Python Packages Is So Complicated](https://pydevtools.com/handbook/explanation/installing-cuda-python-packages.md).

## Install with conda-forge or pixi

`llama-cpp-python` is available on [conda-forge](https://pydevtools.com/handbook/reference/conda-forge.md) for Linux, macOS, and Windows:

```sh
conda install -c conda-forge llama-cpp-python
```

Or with [pixi](https://pydevtools.com/handbook/reference/pixi.md):

```sh
pixi add llama-cpp-python
```

The conda-forge package provides a CPU build. For GPU-accelerated builds, use the pip-based installation methods described above.

## Verify the installation

After installing, confirm that the library loads and check which backends are available:

```python
from llama_cpp import llama_supports_gpu_offload
print("GPU offload supported:", llama_supports_gpu_offload())
```

If this prints `True`, the CUDA or Metal backend compiled correctly. If it prints `False` after a CUDA or Metal install, the build fell back to CPU.

## Troubleshooting

Build fails with "cmake not found": Install CMake (`apt install cmake`, `brew install cmake`, or download from cmake.org) and ensure it is on the PATH.

Build fails with "Could not find compiler set in environment variable CC: clang": The project expects `clang`. Install it with `apt install clang` on Debian/Ubuntu, or set `CC` and `CXX` to point to your preferred compiler before running pip install.

CUDA build completes but `llama_supports_gpu_offload()` returns `False`: The most likely cause is a cached CPU-only wheel. Reinstall with `pip install llama-cpp-python --force-reinstall --no-cache-dir` after setting `CMAKE_ARGS`.

"CUDA_HOME is not set" or "nvcc not found": Install the CUDA Toolkit and verify that `nvcc --version` works from the shell. On Linux, the toolkit is often at `/usr/local/cuda` and needs to be added to `PATH`.

macOS build produces x86_64 binary on Apple Silicon: This happens when using an x86_64 Python interpreter under Rosetta. Install an arm64-native Python, for example via `uv python install`.

Windows build fails with compiler errors: Install Visual Studio Build Tools with the "Desktop development with C++" workload. If using MinGW, the project recommends w64devkit and passing explicit compiler paths in `CMAKE_ARGS`.

## Related

- [Why Installing GPU Python Packages Is So Complicated](https://pydevtools.com/handbook/explanation/installing-cuda-python-packages.md)
- [How to Install PyTorch with uv](https://pydevtools.com/handbook/how-to/how-to-install-pytorch-with-uv.md)
- [uv vs pixi vs conda for Scientific Python](https://pydevtools.com/handbook/explanation/uv-vs-pixi-vs-conda-for-scientific-python.md)