LLMDB.Source behaviour (LLM DB v2025.12.4)
View SourceUnified data source interface for LLMDB.
Sources return providers and models data in canonical Zoi format. No filtering, no excludes. Validation happens later via Engine pipeline.
Output Format: Canonical Zoi v1
All sources MUST return data matching our canonical Zoi schema format.
External formats (e.g., models.dev) must be transformed to canonical format
before returning from load/1.
Type Specifications
provider_id- Atom or string identifying a provider (e.g.,:openai,"anthropic")model_id- String identifying a model (e.g.,"gpt-4o")provider_map- Provider data map with atom keys matching Zoi Provider schemamodel_map- Model data map with atom keys matching Zoi Model schemadata- Source output with providers map, each containing models list
Contract: Canonical Format Required
All source implementations must return {:ok, data} where data is:
%{
"openai" => %{
id: :openai, # REQUIRED: atom or string
name: "OpenAI", # Optional
base_url: "...", # Optional
env: ["OPENAI_API_KEY"], # Optional
doc: "...", # Optional
models: [ # REQUIRED: list
%{
id: "gpt-4o", # REQUIRED: string
provider: :openai, # REQUIRED: atom
name: "GPT-4o", # Optional
limits: %{ # Optional: Zoi Limits schema
context: 128000,
output: 16384
},
cost: %{ # Optional: Zoi Cost schema
input: 2.50,
output: 10.00,
cache_read: 1.25
},
capabilities: %{ # Optional: Zoi Capabilities schema
streaming: %{text: true},
tools: %{enabled: true}
},
modalities: %{ # Optional
input: [:text, :image],
output: [:text]
},
... # Other Zoi Model schema fields
}
]
},
...
}Key requirements:
- Outer keys: strings (provider IDs as strings)
- Provider maps: atom keys, MUST include
:id(atom/string) and:models(list) - Model maps: atom keys matching Zoi Model schema
Return {:error, reason} only if the source cannot produce any data.
For partial failures (e.g., one file fails in multi-file source), handle internally, log warnings, and return available data.
Format Transformation
Sources that read external formats (e.g., models.dev JSON) should implement
a public transform/1 function to make the transformation explicit.
Call this from load/1 before returning.
Example:
def load(opts) do
case read_external_data(opts) do
{:ok, external_data} ->
{:ok, transform(external_data)}
error ->
error
end
end
def transform(external_data) do
# Transform external format → canonical Zoi format
...
endTestability
Sources should accept optional test hooks via opts parameter:
:file_reader- Function for reading files (default:File.read!/1):dir_reader- Function for listing directories (default:File.ls!/1)
This allows tests to inject stubs without filesystem access.
Summary
Functions
Validates that source data matches the canonical Zoi format.
Types
Callbacks
Load data from this source.
For remote sources, this should read from locally cached data (no network calls).
Run mix llm_db.pull to fetch and cache remote data first.
Parameters
opts- Source-specific options map
Returns
{:ok, data}- Success with providers/models data{:error, term}- Fatal error (source cannot produce any data)
@callback pull(opts()) :: pull_result()
Pull remote data and cache it locally.
This callback is optional and only implemented by sources that fetch remote data. When implemented, it should:
- Fetch data from a remote endpoint (e.g., via Req)
- Cache the data locally in
priv/llm_db/remote/ - Write a manifest file with metadata (URL, checksum, timestamp)
- Support conditional GET using ETag/Last-Modified headers
Parameters
opts- Source-specific options map (may include:url,:cache_id, etc.)
Returns
:noop- Data not modified (HTTP 304){:ok, cache_path}- Successfully cached to the given path{:error, term}- Failed to fetch or cache
Functions
@spec assert_canonical!(data()) :: :ok
Validates that source data matches the canonical Zoi format.
This is a lightweight shape assertion to fail fast if a source forgets to transform external data. Full schema validation happens later in the Engine pipeline.
Checks
- Outer structure is a map
- Keys are strings (provider IDs)
- Values are provider maps with atom keys
- Provider maps have required :id and :models fields
- :models is a list
Examples
iex> data = %{"openai" => %{id: :openai, models: []}}
iex> Source.assert_canonical!(data)
:ok
iex> bad_data = %{"openai" => %{"id" => "openai"}}
iex> Source.assert_canonical!(bad_data)
** (ArgumentError) Source.load/1 must return canonical Zoi format