Building erllama

View Source

erllama is a single OTP application with a single NIF (erllama_nif.so). The first compile builds the vendored c_src/llama.cpp/ (~3 minutes on a fast machine), then compiles the small NIF surface and a CRC table. Subsequent builds reuse the cmake cache and finish in seconds.

Toolchain requirements

RequiredNotes
Erlang/OTP28rebar.config declares {minimum_otp_vsn, "28"}.
rebar33.25.0+Earlier 3.24.x is fine for compile but the CI pinned version is 3.25.0.
C++17 toolchainclang 14+ or gcc 11+Apple clang as shipped on macOS works.
cmake3.20+llama.cpp's own minimum is 3.18; we set 3.20 for the FindErlang module.
pthreadsyesLinked via CMake's Threads::Threads.

Build-time dependencies are platform-specific; the recipes below match what CI installs.

Linux (Ubuntu 24.04 amd64 / arm64)

sudo apt-get install -y build-essential cmake
# Erlang/OTP 28 from erlef setup-beam (manual install also fine).
asdf install erlang 28.0 && asdf local erlang 28.0
asdf install rebar 3.25.0 && asdf local rebar 3.25.0
rebar3 compile

OpenMP is intentionally disabled in c_src/CMakeLists.txt (set(GGML_OPENMP OFF ...)); the system libgomp.a ships without -fPIC on stock Ubuntu, which would break the shared NIF link with R_X86_64_TPOFF32 against hidden symbol gomp_tls_data. Disabling OpenMP at the ggml level avoids that entirely; the GPU paths (Metal/CUDA) are unaffected.

CUDA is off by default. Enable with:

ERLLAMA_OPTS=-DGGML_CUDA=ON rebar3 compile

macOS (Apple Silicon and Intel)

brew install erlang@28 rebar3 cmake
echo 'export PATH="$(brew --prefix erlang@28)/bin:$PATH"' >> ~/.zshrc
rebar3 compile

Metal and Apple BLAS (Accelerate) are auto-detected and on by default. Compile is ~30 s after the first ggml build is cached.

FreeBSD (14.2 / 14.4)

# The cached FreeBSD VM image (or a freshly-installed system) ships
# an older libpcre2 than the git package in the latest pkg repo
# expects (PCRE2_10.47 not defined). Refresh first so git can load.
pkg install -y pcre2

# erllama needs OTP 28+; the base `erlang` package is 26.x.
# erlang-runtime28 installs OTP 28 under /usr/local/lib/erlang28.
pkg install -y erlang-runtime28 cmake bash gmake git

export PATH="/usr/local/lib/erlang28/bin:/usr/local/bin:$PATH"

# llama.cpp's build-info cmake script invokes `git rev-parse`. When
# the build directory's owner differs from the user (typical inside
# CI VMs), git refuses with "dubious ownership" — allow the path.
git config --global --add safe.directory "$PWD"

# rebar3 isn't always available as a pkg; fetch it once.
fetch https://github.com/erlang/rebar3/releases/download/3.25.0/rebar3 -o rebar3
chmod +x rebar3

./rebar3 compile

Erlang ERTS detection

The build needs erl_nif.h from the Erlang installation. erllama uses c_src/CMake/FindErlang.cmake (adopted from erlang-rocksdb) which runs erl -noshell -eval to read code:lib_dir/0 / code:root_dir/0 and exports ERLANG_ERTS_INCLUDE_PATH. If the caller pre-sets the ERTS_INCLUDE_DIR environment variable, that takes precedence (useful for cross-compilation or pinned headers).

What the build produces

  • priv/erllama_nif.so — the single NIF, statically linked against the vendored c_src/llama.cpp (libllama, libggml, ggml-cpu, plus the platform GPU/BLAS backends) and c_src/crc32c.c.
  • _build/default/lib/erllama/ebin/*.beam — Erlang modules.
  • _build/cmake/ — CMake build dir; cached for incremental builds.

Common build issues

  • 'erl_nif.h' file not foundERTS_INCLUDE_DIR is wrong. FindErlang.cmake should resolve it automatically; if it fails, set the env var explicitly: ERTS_INCLUDE_DIR=$(erl -noshell -eval 'io:format("~s",[filename:join([code:root_dir(),"erts-"++erlang:system_info(version),"include"])]),halt().') rebar3 compile.
  • R_X86_64_TPOFF32 against hidden symbol gomp_tls_data — your libgomp.a is non-PIC. erllama's CMakeLists already sets GGML_OPENMP OFF to avoid this. If you re-enabled OpenMP, build a PIC libgomp or leave it off.
  • PCRE2_10.47 not defined when running git on FreeBSD — refresh pcre2 first: pkg install -y pcre2. The cached VM image lags the latest repo.
  • macOS metal init slow on first model load — the lazy llama_backend_init runs on the first erllama:load_model/1 call and discovers Metal devices. eunit cases that load a model need a generator timeout >5 s; see test/erllama_nif_tests.erl:load_model_rejects_non_existent_path_test_/0 for the pattern.

Verifying the build

rebar3 fmt --check
rebar3 compile
rebar3 xref
rebar3 dialyzer
rebar3 lint
rebar3 eunit       # 162 tests, 0 failures
rebar3 ct          # 7 stub-backend cases pass; 6 real-model cases skip

End-to-end against a real GGUF:

LLAMA_TEST_MODEL=/path/to/tinyllama-1.1b-chat.gguf \
    rebar3 ct --suite=test/erllama_real_model_SUITE

Without the env var the suite skips, so default rebar3 ct stays green on machines without a model file.

Bumping the vendored llama.cpp

See UPDATE_LLAMA.md at the project root.