LLMDB.Engine (LLM DB v2025.12.4)
View SourcePure ETL pipeline for BUILD-TIME LLM model catalog generation.
Engine is a pure function: sources in, snapshot out. It processes ONLY the sources explicitly passed via options or configured sources.
This module is designed for BUILD-TIME use (e.g., mix tasks) to generate complete, unfiltered snapshots from remote/local sources that will be packaged into the library.
Pipeline Stages
- Ingest - Load data from configured sources
- Normalize - Apply normalization to providers and models per layer
- Validate - Validate schemas and log dropped records per layer
- Merge - Combine layers with precedence rules (last wins)
- Finalize - Enrich and nest models under providers
- Ensure viable - Verify catalog has content (warns if empty)
Architecture
Sources are processed in order with last-wins precedence:
- First source (lowest precedence)
- Second source
- ... (higher precedence)
- Last source (highest precedence)
The engine coordinates data ingestion, normalization, validation, merging, and finalization to produce a complete v2 snapshot ready for JSON serialization.
Filtering and indexing are deferred to load-time - the snapshot contains
ALL data from sources. Runtime policies (allow/deny patterns, preferences)
are applied when the snapshot is loaded via LLMDB.load/1.
Summary
Functions
Applies allow/deny filters to models.
Builds the nested v2 provider structure for snapshot serialization.
Runs the complete ETL pipeline to generate a model catalog snapshot.
Functions
Applies allow/deny filters to models.
Deny patterns always win over allow patterns.
Parameters
models- List of model mapsfilters- %{allow: compiled_patterns, deny: compiled_patterns}
Returns
Filtered list of models
Builds the nested v2 provider structure for snapshot serialization.
Groups models by provider and nests them under their provider. Models are keyed by model.id for easy lookup.
Parameters
providers- List of provider mapsmodels- List of model maps
Returns
%{atom => %{provider fields + models: %{string => model}}}
Runs the complete ETL pipeline to generate a model catalog snapshot.
Pure function that processes sources into a complete, unfiltered snapshot. BUILD-TIME only.
Options
:sources- List of{module, opts}source tuples (optional, defaults to Config.sources!())
Note: :allow, :deny, :prefer, and :filters options are ignored.
Filtering is a load-time concern applied via LLMDB.load/1 and runtime config.
Returns
{:ok, snapshot_map}- Success with v2 snapshot structure{:ok, snapshot_map}- Empty catalog (warns but succeeds if no sources){:error, term}- Other error
Snapshot Structure (v2)
%{
version: 2,
generated_at: String.t(),
providers: %{atom => %{provider_fields... + models: %{String.t() => Model.t()}}}
}The snapshot contains ALL models from all sources. Indexes and filters are
built at load-time by LLMDB.load/1 using the LLMDB.Index module.