# LlamaCppEx v0.7.0 - Table of Contents Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU acceleration. ## Pages - [LlamaCppEx](readme.md) - [Changelog](changelog.md) - [LICENSE](license.md) - [Architecture](architecture.md) - [Cross-Platform Builds](cross-platform-builds.md) - [Examples](examples.md) - [Performance Guide](performance.md) - [Release Guide](release-guide.md) - Architecture Decision Records - [ADR 001: C++ NIF Over Rustler](001-cpp-nif-over-rustler.md) - [ADR 002: fine for NIF Ergonomics](002-fine-for-nif-ergonomics.md) - [ADR 003: Static Linking of llama.cpp](003-static-linking.md) - [ADR 004: Streaming via enif_send](004-streaming-via-enif-send.md) - [ADR 005: Batching Architecture](005-batching-architecture.md) - [ADR 006: Continuous Batching](006-continuous-batching.md) - [ADR 007: Prefix Caching (Same-Slot KV Reuse)](007-prefix-caching.md) - [ADR 008: Pluggable Batching Strategies](008-batching-strategies.md) ## Modules - [LlamaCppEx.ChatCompletion](LlamaCppEx.ChatCompletion.md): OpenAI-compatible chat completion response struct. - [LlamaCppEx.ChatCompletionChunk](LlamaCppEx.ChatCompletionChunk.md): OpenAI-compatible streaming chat completion chunk struct. - [LlamaCppEx.Thinking](LlamaCppEx.Thinking.md): Parser for `...` blocks in thinking model output. - High-Level API - [LlamaCppEx](LlamaCppEx.md): Elixir bindings for llama.cpp. - Core Modules - [LlamaCppEx.Chat](LlamaCppEx.Chat.md): Chat template formatting using llama.cpp's Jinja template engine. - [LlamaCppEx.Context](LlamaCppEx.Context.md): Inference context with KV cache. - [LlamaCppEx.Embedding](LlamaCppEx.Embedding.md): Generate embeddings from text using an embedding model. - [LlamaCppEx.Grammar](LlamaCppEx.Grammar.md): Converts JSON Schema to GBNF grammar for constrained generation. - [LlamaCppEx.Hub](LlamaCppEx.Hub.md): Download GGUF models from HuggingFace Hub. - [LlamaCppEx.Model](LlamaCppEx.Model.md): Model loading and introspection. - [LlamaCppEx.Sampler](LlamaCppEx.Sampler.md): Token sampling configuration. - [LlamaCppEx.Schema](LlamaCppEx.Schema.md): Converts Ecto schema modules to JSON Schema maps for structured output. - [LlamaCppEx.Server](LlamaCppEx.Server.md): GenServer for continuous batched multi-sequence inference. - [LlamaCppEx.Tokenizer](LlamaCppEx.Tokenizer.md): Text tokenization and detokenization. - Batching Strategies - [LlamaCppEx.Server.BatchStrategy](LlamaCppEx.Server.BatchStrategy.md): Behavior for batch building strategies. - [LlamaCppEx.Server.Strategy.Balanced](LlamaCppEx.Server.Strategy.Balanced.md): Balanced batching strategy. - [LlamaCppEx.Server.Strategy.DecodeMaximal](LlamaCppEx.Server.Strategy.DecodeMaximal.md): Decode-maximal batching strategy. - [LlamaCppEx.Server.Strategy.PrefillPriority](LlamaCppEx.Server.Strategy.PrefillPriority.md): Prefill-priority batching strategy.