Updating the vendored llama.cpp

View Source

erllama vendors a pinned copy of llama.cpp under c_src/llama.cpp/. The current pin is b9093.

This file documents the bump procedure.

Why pin

  • Reproducible builds: every developer and CI run compiles the same source.
  • Hex.pm-friendly: published packages contain the full source so no network access is needed at install time.
  • Backend stability: llama.cpp moves fast, especially in the model zoo. We control when we adopt new architectures.

What we ship

We vendor only the parts we need. Currently:

c_src/llama.cpp/
  CMakeLists.txt            llama.cpp's top-level CMake
  LICENSE                   MIT (dual MIT/Apache, llama.cpp pick MIT)
  cmake/                    CMake helpers (toolchain files, etc)
  include/                  public headers (llama.h, etc)
  src/                      llama core (model.cpp, context.cpp, etc)
  ggml/
    CMakeLists.txt
    cmake/                  ggml CMake helpers (common.cmake, GitVars.cmake)
    include/                public ggml headers
    src/
      CMakeLists.txt
      ggml*.c, ggml*.cpp, ggml*.h  core ggml + frontends
      gguf.cpp                     GGUF file format
      ggml-cpu/                    CPU SIMD kernels (mandatory)
      ggml-metal/                  Apple GPU backend (Apple Silicon)
      ggml-cuda/                   NVIDIA GPU backend (Linux x86-64)
      ggml-blas/                   BLAS backend (OpenBLAS / Accelerate)

Excluded (unused or out-of-scope for v1):

  • tools/, examples/, tests/, docs/, models/, gguf-py/, benches/, ci/, scripts/, grammars/, vendor/, .git/, .github/, AUTHORS, .devops/
  • ggml backends we do not link: Vulkan, SYCL, OpenCL, CANN, Hexagon, HIP, MUSA, RPC, ZDNN, ZenDNN, Virtgpu, Webgpu, OpenVINO

If a user needs one of the excluded backends they can build erllama against an unvendored llama.cpp via git+ rebar dep instead of the hex package; that path is supported but unsupported in this scaffold.

Bumping

Pick a tag from https://github.com/ggml-org/llama.cpp/tags. Newer tags are usually fine; check the changelog for breaking C-API changes to llama_state_seq_* (the cache layer depends on those).

# 1. Clone the new tag into a scratch directory.
cd /tmp
git clone --depth=1 --branch=<TAG> https://github.com/ggml-org/llama.cpp llama.cpp.new

# 2. Sync the parts we vendor.
cd /Users/benoitc/Projects/erllama
rm -rf c_src/llama.cpp
mkdir -p c_src/llama.cpp/ggml/src

cp -r /tmp/llama.cpp.new/{src,include,cmake,CMakeLists.txt,LICENSE} \
      c_src/llama.cpp/
cp -r /tmp/llama.cpp.new/ggml/{include,cmake,CMakeLists.txt} \
      c_src/llama.cpp/ggml/
cp /tmp/llama.cpp.new/ggml/src/CMakeLists.txt \
   c_src/llama.cpp/ggml/src/
cp /tmp/llama.cpp.new/ggml/src/ggml*.c \
   /tmp/llama.cpp.new/ggml/src/ggml*.cpp \
   /tmp/llama.cpp.new/ggml/src/ggml*.h \
   /tmp/llama.cpp.new/ggml/src/gguf.cpp \
   c_src/llama.cpp/ggml/src/
cp -r /tmp/llama.cpp.new/ggml/src/{ggml-cpu,ggml-metal,ggml-cuda,ggml-blas} \
      c_src/llama.cpp/ggml/src/

# 3. Rebuild and run the full test gauntlet.
rm -rf _build
rebar3 compile
rebar3 fmt --check && rebar3 lint && rebar3 xref \
    && rebar3 eunit && rebar3 proper && rebar3 ct

# 4. Update the pin reference in this file and in
#    c_src/llama.cpp/.version (if present).

# 5. Commit with a message naming the new tag.

Configuration knobs

The CMake configure step honours these env vars (passed via ERLLAMA_OPTS to do_cmake.sh):

ERLLAMA_OPTS="-DGGML_CUDA=ON"           # enable CUDA on Linux x86-64
ERLLAMA_OPTS="-DGGML_METAL=OFF"         # disable Metal on Darwin
ERLLAMA_OPTS="-DGGML_BLAS=OFF"          # disable BLAS
ERLLAMA_OPTS="-DCMAKE_BUILD_TYPE=Debug" # debug build

The build step honours ERLLAMA_BUILDOPTS (passed to cmake --build).

Why we drop common/

llama.cpp's common/ carries HTTP / Hugging Face download helpers that pull in cpp-httplib (5 MB). erllama uses the public llama.h API directly and provides its own thin sampling / tokenization helpers in the NIF; nothing in common/ is on our critical path.