Sources and Engine
View SourceThe Engine runs a build-time ETL pipeline that loads data from sources, normalizes, validates, merges, enriches, and indexes it, then writes priv/llm_db/snapshot.json. Runtime only loads the snapshot.
Source Behaviour
All sources implement the LLMDB.Source behaviour:
@callback load(opts :: map()) :: {:ok, data :: map()} | {:error, term()}
@callback pull(opts :: map()) :: :ok | {:error, term()} # OptionalCanonical Format
%{
"providers" => %{
openai: %{
"id" => :openai,
"name" => "OpenAI",
"base_url" => "https://api.openai.com/v1",
# ...
}
},
"models" => [
%{
"id" => "gpt-4",
"provider" => :openai,
"name" => "GPT-4",
# ...
},
# ...
]
}Outer map uses string keys; provider keys are atoms; model IDs are strings. Use LLMDB.Source.assert_canonical!/1 for validation.
Built-in Sources
ModelsDev (Remote)
{LLMDB.Sources.ModelsDev, %{
url: "https://models.dev/api/models",
cache_path: "priv/llm_db/cache/models_dev.json"
}}pull/1 downloads and caches via Req. load/1 loads from cache. Transforms models.dev schema to canonical format (limit → limits, modality strings → atoms, unmapped → :extra).
Local (TOML)
{LLMDB.Sources.Local, %{dir: "priv/llm_db"}}Structure: provider.toml + models/{provider}/*.toml. Atomizes keys, injects :provider from directory name.
Configuring Sources
config :llm_db,
sources: [
{LLMDB.Sources.ModelsDev, %{}},
{LLMDB.Sources.Local, %{dir: "priv/llm_db"}}
]Sources processed in order. Later sources override earlier ones.
ETL Pipeline
LLMDB.Engine.run/1 executes 7 stages:
- Ingest: Load sources, validate canonical format, flatten nested provider data
- Normalize: Convert provider IDs to atoms, normalize modalities to atoms, parse dates
- Validate: Zoi validation via
LLMDB.Validate, drop invalid, log warnings - Merge: Last-wins precedence;
:aliasesare unioned, other lists replaced, maps deep merged - Filter: Compile allow/deny patterns (deny wins, globs supported)
- Enrich: Derive
:family, fill:provider_model_id, apply capability defaults - Index: Build
providers_by_id,models_by_key,models_by_provider,aliases_by_key, then v2 snapshot
Final check warns if zero providers/models.
Mix Tasks
mix llm_db.pull- Fetch and cache remote sourcesmix llm_db.build- Run ETL, writepriv/llm_db/snapshot.jsonandlib/llm_db/generated/valid_providers.ex
Custom Source Example
defmodule MyApp.InternalModels do
@behaviour LLMDB.Source
@impl true
def load(_opts) do
{:ok, %{
"providers" => %{internal: %{"id" => :internal, "name" => "Internal"}},
"models" => [%{"id" => "custom-gpt", "provider" => :internal, "capabilities" => %{"chat" => true}}]
}}
end
end
# config.exs
config :llm_db, sources: [{MyApp.InternalModels, %{}}]