# `LLMDB.Engine`
[🔗](https://github.com/agentjido/llm_db/blob/main/lib/llm_db/engine.ex#L1)

Pure ETL pipeline for BUILD-TIME LLM model catalog generation.

Engine is a pure function: sources in, snapshot out. It processes ONLY
the sources explicitly passed via options or configured sources.

This module is designed for BUILD-TIME use (e.g., mix tasks) to generate
complete, unfiltered snapshots from remote/local sources that will be
packaged into the library.

## Pipeline Stages

1. **Ingest** - Load data from configured sources
2. **Normalize** - Apply normalization to providers and models per layer
3. **Validate** - Validate schemas and log dropped records per layer
4. **Merge** - Combine layers with precedence rules (last wins)
5. **Finalize** - Enrich and nest models under providers
6. **Ensure viable** - Verify catalog has content (warns if empty)

## Architecture

Sources are processed in order with last-wins precedence:
1. First source (lowest precedence)
2. Second source
3. ... (higher precedence)
4. Last source (highest precedence)

The engine coordinates data ingestion, normalization, validation, merging,
and finalization to produce a complete v2 snapshot ready for JSON serialization.

**Filtering and indexing are deferred to load-time** - the snapshot contains
ALL data from sources. Runtime policies (allow/deny patterns, preferences)
are applied when the snapshot is loaded via `LLMDB.load/1`.

# `apply_filters`

```elixir
@spec apply_filters([map()], map()) :: [map()]
```

Applies allow/deny filters to models.

Deny patterns always win over allow patterns.

## Parameters

- `models` - List of model maps
- `filters` - %{allow: compiled_patterns, deny: compiled_patterns}

## Returns

Filtered list of models

# `build_nested_providers`

```elixir
@spec build_nested_providers([map()], [map()]) :: %{required(atom()) =&gt; map()}
```

Builds the nested v2 provider structure for snapshot serialization.

Groups models by provider and nests them under their provider.
Models are keyed by model.id for easy lookup.

## Parameters

- `providers` - List of provider maps
- `models` - List of model maps

## Returns

%{atom => %{provider fields + models: %{string => model}}}

# `run`

```elixir
@spec run(keyword()) :: {:ok, map()} | {:error, term()}
```

Runs the complete ETL pipeline to generate a model catalog snapshot.

Pure function that processes sources into a complete, unfiltered snapshot.
BUILD-TIME only.

## Options

- `:sources` - List of `{module, opts}` source tuples (optional, defaults to Config.sources!())

Note: `:allow`, `:deny`, `:prefer`, and `:filters` options are ignored.
Filtering is a load-time concern applied via `LLMDB.load/1` and runtime config.

## Returns

- `{:ok, snapshot_map}` - Success with v2 snapshot structure
- `{:ok, snapshot_map}` - Empty catalog (warns but succeeds if no sources)
- `{:error, term}` - Other error

## Snapshot Structure (v2)

```elixir
%{
  version: 2,
  generated_at: String.t(),
  providers: %{atom => %{provider_fields... + models: %{String.t() => Model.t()}}}
}
```

The snapshot contains ALL models from all sources. Indexes and filters are
built at load-time by `LLMDB.load/1` using the `LLMDB.Index` module.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
