# `LLMDB.Source`
[🔗](https://github.com/agentjido/llm_db/blob/main/lib/llm_db/source.ex#L1)

Unified data source interface for LLMDB.

Sources return providers and models data in **canonical Zoi format**.
No filtering, no excludes. Validation happens later via Engine pipeline.

## Output Format: Canonical Zoi v1

All sources MUST return data matching our canonical Zoi schema format.
External formats (e.g., models.dev) must be transformed to canonical format
before returning from `load/1`.

## Type Specifications

- `provider_id` - Atom or string identifying a provider (e.g., `:openai`, `"anthropic"`)
- `model_id` - String identifying a model (e.g., `"gpt-4o"`)
- `provider_map` - Provider data map with atom keys matching Zoi Provider schema
- `model_map` - Model data map with atom keys matching Zoi Model schema
- `data` - Source output with providers map, each containing models list

## Contract: Canonical Format Required

All source implementations must return `{:ok, data}` where data is:

    %{
      "openai" => %{
        id: :openai,                    # REQUIRED: atom or string
        name: "OpenAI",                 # Optional
        base_url: "...",                # Optional
        env: ["OPENAI_API_KEY"],        # Optional
        doc: "...",                     # Optional
        models: [                       # REQUIRED: list
          %{
            id: "gpt-4o",               # REQUIRED: string
            provider: :openai,          # REQUIRED: atom
            name: "GPT-4o",             # Optional
            limits: %{                  # Optional: Zoi Limits schema
              context: 128000,
              output: 16384
            },
            cost: %{                    # Optional: Zoi Cost schema
              input: 2.50,
              output: 10.00,
              cache_read: 1.25
            },
            capabilities: %{            # Optional: Zoi Capabilities schema
              streaming: %{text: true},
              tools: %{enabled: true}
            },
            modalities: %{              # Optional
              input: [:text, :image],
              output: [:text]
            },
            ...                         # Other Zoi Model schema fields
          }
        ]
      },
      ...
    }

**Key requirements:**
- Outer keys: strings (provider IDs as strings)
- Provider maps: atom keys, MUST include `:id` (atom/string) and `:models` (list)
- Model maps: atom keys matching Zoi Model schema

Return `{:error, reason}` only if the source cannot produce any data.

For partial failures (e.g., one file fails in multi-file source), handle
internally, log warnings, and return available data.

## Format Transformation

Sources that read external formats (e.g., models.dev JSON) should implement
a public `transform/1` function to make the transformation explicit.
Call this from `load/1` before returning.

Example:

    def load(opts) do
      case read_external_data(opts) do
        {:ok, external_data} ->
          {:ok, transform(external_data)}
        error ->
          error
      end
    end

    def transform(external_data) do
      # Transform external format → canonical Zoi format
      ...
    end

## Testability

Sources should accept optional test hooks via `opts` parameter:
- `:file_reader` - Function for reading files (default: `File.read!/1`)
- `:dir_reader` - Function for listing directories (default: `File.ls!/1`)

This allows tests to inject stubs without filesystem access.

# `data`

```elixir
@type data() :: %{required(String.t()) =&gt; provider_map()}
```

# `model_id`

```elixir
@type model_id() :: String.t()
```

# `model_map`

```elixir
@type model_map() :: map()
```

# `opts`

```elixir
@type opts() :: map()
```

# `provider_id`

```elixir
@type provider_id() :: atom() | String.t()
```

# `provider_map`

```elixir
@type provider_map() :: map()
```

# `pull_result`

```elixir
@type pull_result() :: :noop | {:ok, String.t()} | {:error, term()}
```

# `load`

```elixir
@callback load(opts()) :: {:ok, data()} | {:error, term()}
```

Load data from this source.

For remote sources, this should read from locally cached data (no network calls).
Run `mix llm_db.pull` to fetch and cache remote data first.

## Parameters

- `opts` - Source-specific options map

## Returns

- `{:ok, data}` - Success with providers/models data
- `{:error, term}` - Fatal error (source cannot produce any data)

# `pull`
*optional* 

```elixir
@callback pull(opts()) :: pull_result()
```

Pull remote data and cache it locally.

This callback is optional and only implemented by sources that fetch remote data.
When implemented, it should:
- Fetch data from a remote endpoint (e.g., via Req)
- Cache the data locally in `priv/llm_db/remote/`
- Write a manifest file with metadata (URL, checksum, timestamp)
- Support conditional GET using ETag/Last-Modified headers

## Parameters

- `opts` - Source-specific options map (may include `:url`, `:cache_id`, etc.)

## Returns

- `:noop` - Data not modified (HTTP 304)
- `{:ok, cache_path}` - Successfully cached to the given path
- `{:error, term}` - Failed to fetch or cache

# `assert_canonical!`

```elixir
@spec assert_canonical!(data()) :: :ok
```

Validates that source data matches the canonical Zoi format.

This is a lightweight shape assertion to fail fast if a source
forgets to transform external data. Full schema validation happens
later in the Engine pipeline.

## Checks

- Outer structure is a map
- Keys are strings (provider IDs)
- Values are provider maps with atom keys
- Provider maps have required :id and :models fields
- :models is a list

## Examples

    iex> data = %{"openai" => %{id: :openai, models: []}}
    iex> Source.assert_canonical!(data)
    :ok

    iex> bad_data = %{"openai" => %{"id" => "openai"}}
    iex> Source.assert_canonical!(bad_data)
    ** (ArgumentError) Source.load/1 must return canonical Zoi format

---

*Consult [api-reference.md](api-reference.md) for complete listing*