Skip to content

Build a Python library with a C extension

Python handles strings well. C handles them faster. In this tutorial, you’ll write three string functions in C, compile them into a shared library, and call them from Python using cffi. The result is a normal Python package that anyone can import.

Prerequisites

You need two things installed:

Most macOS and Linux systems already have a C compiler. Check by running:

cc --version

If that fails, install one:

  • macOS: Run xcode-select --install
  • Ubuntu/Debian: Run sudo apt install build-essential
  • Fedora: Run sudo dnf install gcc

Creating the Project

Create a new library project with uv:

uv init string_utils --lib
cd string_utils

This generates a src/string_utils/ directory with an __init__.py file and a pyproject.toml that looks like this:

pyproject.toml
[project]
name = "string-utils"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = []

Add cffi as a dependency:

uv add cffi

Writing the C Code

Create a file called string_ops.c inside src/string_utils/. This file contains three functions: one counts words, one reverses a string, and one checks for palindromes.

src/string_utils/string_ops.c
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

int word_count(const char *text) {
    int count = 0;
    int in_word = 0;

    while (*text) {
        if (isspace((unsigned char)*text)) {
            in_word = 0;
        } else if (!in_word) {
            in_word = 1;
            count++;
        }
        text++;
    }
    return count;
}

char *reverse(const char *text) {
    int len = strlen(text);
    char *result = (char *)malloc(len + 1);
    if (!result) return NULL;

    for (int i = 0; i < len; i++) {
        result[i] = text[len - 1 - i];
    }
    result[len] = '\0';
    return result;
}

int is_palindrome(const char *text) {
    int len = strlen(text);
    /* Build a lowercase, letters-only copy */
    char *cleaned = (char *)malloc(len + 1);
    if (!cleaned) return 0;

    int j = 0;
    for (int i = 0; i < len; i++) {
        if (isalpha((unsigned char)text[i])) {
            cleaned[j++] = tolower((unsigned char)text[i]);
        }
    }
    cleaned[j] = '\0';

    int left = 0;
    int right = j - 1;
    while (left < right) {
        if (cleaned[left] != cleaned[right]) {
            free(cleaned);
            return 0;
        }
        left++;
        right--;
    }
    free(cleaned);
    return 1;
}

Each function takes a C string (const char *) as input. word_count walks the string character by character, counting transitions from whitespace to non-whitespace. reverse allocates a new string and fills it back-to-front. is_palindrome strips non-letter characters, lowercases the remainder, and checks whether it reads the same forwards and backwards.

Note

The reverse function returns a pointer to newly allocated memory. The caller (our Python wrapper) is responsible for freeing it. cffi makes this straightforward.

Compiling the C Code

Before Python can call these functions, the C source needs to be compiled into a shared library. Create a build script at the project root:

build_c.py
"""Compile string_ops.c into a shared library."""

import os
import subprocess
import sys


def main():
    src_dir = os.path.join(os.path.dirname(__file__), "src", "string_utils")
    c_file = os.path.join(src_dir, "string_ops.c")

    if sys.platform == "win32":
        lib_name = "string_ops.dll"
        cmd = ["cl", "/LD", "/O2", c_file,
               f"/Fe{os.path.join(src_dir, lib_name)}"]
    elif sys.platform == "darwin":
        lib_name = "string_ops.dylib"
        cmd = ["cc", "-shared", "-fPIC", "-O2", "-std=c99",
               "-o", os.path.join(src_dir, lib_name), c_file]
    else:
        lib_name = "string_ops.so"
        cmd = ["cc", "-shared", "-fPIC", "-O2", "-std=c99",
               "-o", os.path.join(src_dir, lib_name), c_file]

    print(f"Compiling {c_file} -> {lib_name}")
    subprocess.check_call(cmd)
    print("Done.")


if __name__ == "__main__":
    main()

The -shared flag tells the compiler to produce a shared library instead of an executable. The -fPIC flag generates position-independent code, which shared libraries require on Linux and macOS.

Run the build script:

uv run python build_c.py

You should see output like:

$ uv run python build_c.py
Compiling src/string_utils/string_ops.c -> string_ops.so
Done.

The compiled library now sits next to the C source file in src/string_utils/.

Writing the Python Wrapper

Replace the contents of src/string_utils/__init__.py with a wrapper that uses cffi to load the shared library and expose the C functions as Python functions:

src/string_utils/__init__.py
"""String utilities implemented in C, loaded via cffi."""

import os
import sys

import cffi

ffi = cffi.FFI()

# Declare the C function signatures
ffi.cdef("""
    int word_count(const char *text);
    char *reverse(const char *text);
    int is_palindrome(const char *text);
    void free(void *ptr);
""")

# Load the compiled shared library
_dir = os.path.dirname(__file__)
if sys.platform == "win32":
    _lib_path = os.path.join(_dir, "string_ops.dll")
elif sys.platform == "darwin":
    _lib_path = os.path.join(_dir, "string_ops.dylib")
else:
    _lib_path = os.path.join(_dir, "string_ops.so")

_lib = ffi.dlopen(_lib_path)


def word_count(text: str) -> int:
    """Count the number of words in a string."""
    return _lib.word_count(text.encode("utf-8"))


def reverse(text: str) -> str:
    """Reverse a string."""
    result_ptr = _lib.reverse(text.encode("utf-8"))
    result = ffi.string(result_ptr).decode("utf-8")
    _lib.free(result_ptr)  # free the C-allocated memory
    return result


def is_palindrome(text: str) -> bool:
    """Check whether a string is a palindrome (ignoring case and non-letters)."""
    return bool(_lib.is_palindrome(text.encode("utf-8")))

This wrapper does three things:

  1. ffi.cdef() tells cffi what C functions exist and what their signatures look like
  2. ffi.dlopen() loads the compiled shared library into memory
  3. Each Python function encodes the input string to bytes, calls the C function, and converts the result back to a Python type

The reverse wrapper also frees the memory that the C function allocated, preventing a memory leak.

Testing the Library

Try the library interactively:

uv run python -c "
import string_utils
print(string_utils.word_count('hello world'))
print(string_utils.reverse('Python'))
print(string_utils.is_palindrome('racecar'))
"

Expected output:

2
nohtyP
True

For repeatable tests, add pytest as a development dependency:

uv add --dev pytest

Create a test file at the project root:

test_string_utils.py
import string_utils


def test_word_count():
    assert string_utils.word_count("hello world") == 2
    assert string_utils.word_count("one") == 1
    assert string_utils.word_count("  spaced   out  ") == 2
    assert string_utils.word_count("") == 0


def test_reverse():
    assert string_utils.reverse("hello") == "olleh"
    assert string_utils.reverse("Python") == "nohtyP"
    assert string_utils.reverse("") == ""


def test_is_palindrome():
    assert string_utils.is_palindrome("racecar") is True
    assert string_utils.is_palindrome("A man a plan a canal Panama") is True
    assert string_utils.is_palindrome("hello") is False
    assert string_utils.is_palindrome("Was it a car or a cat I saw") is True

Run the tests:

uv run pytest test_string_utils.py -v
$ uv run pytest test_string_utils.py -v
============================= test session starts ==============================
collected 3 items

test_string_utils.py::test_word_count PASSED                             [ 33%]
test_string_utils.py::test_reverse PASSED                                [ 66%]
test_string_utils.py::test_is_palindrome PASSED                          [100%]

============================== 3 passed in 0.05s ===============================

Project Structure

Your finished project looks like this:

    • pyproject.toml
    • build_c.py
    • test_string_utils.py
    • uv.lock
    • README.md
        • init.py
        • string_ops.c
        • string_ops.so

The .so file (or .dylib on macOS, .dll on Windows) is the compiled shared library. The .c file ships with your project so anyone can recompile it for their platform.

How cffi Connects C and Python

The connection between C and Python happens through cffi’s ABI mode. Here’s the sequence:

  1. ffi.cdef() parses C declarations so cffi knows each function’s argument types and return type
  2. ffi.dlopen() loads the compiled shared library (.so, .dylib, or .dll) into the Python process
  3. When Python calls _lib.word_count(b"hello world"), cffi converts the Python bytes to a C const char *, calls the C function, and converts the C int result back to a Python int

Strings need special handling because C strings are null-terminated byte arrays while Python strings are Unicode objects. The wrapper encodes Python strings to UTF-8 bytes before passing them to C, and decodes the results back to Python strings.

Next Steps

This tutorial used cffi’s ABI mode (ffi.dlopen), which loads a pre-compiled shared library at runtime. cffi also supports an API mode (ffi.set_source) that compiles the C code during package installation. The API mode produces faster function calls because it avoids the dynamic lookup overhead of dlopen.

For distributing your package, consider:

Learn More

Last updated on

Please submit corrections and feedback...