# `Arcana.Embedder`
[🔗](https://github.com/georgeguimaraes/arcana/blob/main/lib/arcana/embedder.ex#L1)

Behaviour for embedding providers used by Arcana.

Arcana accepts any module that implements this behaviour.
Built-in implementations are provided for:

- `Arcana.Embedder.Local` - Local Bumblebee models (e.g., `bge-small-en-v1.5`)
- `Arcana.Embedder.OpenAI` - OpenAI embeddings via Req.LLM

## Configuration

Configure your embedding provider in `config.exs`:

    # Default: Local Bumblebee with bge-small-en-v1.5 (384 dims)
    config :arcana, embedder: :local

    # Local with different HuggingFace model
    config :arcana, embedder: {:local, model: "BAAI/bge-large-en-v1.5"}

    # OpenAI via Req.LLM
    config :arcana, embedder: :openai
    config :arcana, embedder: {:openai, model: "text-embedding-3-large"}

    # Custom function
    config :arcana, embedder: fn text -> {:ok, embedding} end

    # Custom module implementing this behaviour
    config :arcana, embedder: MyApp.CohereEmbedder
    config :arcana, embedder: {MyApp.CohereEmbedder, api_key: "..."}

## Implementing a Custom Embedder

Create a module that implements this behaviour:

    defmodule MyApp.CohereEmbedder do
      @behaviour Arcana.Embedder

      @impl true
      def embed(text, opts) do
        api_key = opts[:api_key] || System.get_env("COHERE_API_KEY")
        # Call Cohere API...
        {:ok, embedding}
      end

      @impl true
      def dimensions(_opts), do: 1024
    end

Then configure:

    config :arcana, embedder: {MyApp.CohereEmbedder, api_key: "..."}

# `dimensions`

```elixir
@callback dimensions(opts :: keyword()) :: pos_integer()
```

Returns the embedding dimensions.

# `embed`

```elixir
@callback embed(text :: String.t(), opts :: keyword()) ::
  {:ok, [float()]} | {:error, term()}
```

Embed a single text string.

Returns `{:ok, embedding}` where embedding is a list of floats,
or `{:error, reason}` on failure.

# `embed_batch`
*optional* 

```elixir
@callback embed_batch(texts :: [String.t()], opts :: keyword()) ::
  {:ok, [[float()]]} | {:error, term()}
```

Embed multiple texts in batch.

Default implementation calls `embed/2` for each text sequentially.
Override for providers that support native batch embedding.

# `dimensions`

Returns the embedding dimensions for the configured embedder.

# `embed`

Embeds text using the configured embedder.

The embedder is a `{module, opts}` tuple where module implements
this behaviour.

## Options

  * `:intent` - The embedding intent, either `:query` or `:document`.
    Used by models like E5 that require different prefixes for
    search queries vs document content. Defaults to `:document`.

## Examples

    # Embed a search query (uses "query: " prefix for E5 models)
    Embedder.embed(embedder, "what is machine learning?", intent: :query)

    # Embed document content (uses "passage: " prefix for E5 models)
    Embedder.embed(embedder, "Machine learning is...", intent: :document)

# `embed_batch`

Embeds multiple texts using the configured embedder.

Falls back to sequential embedding if the module doesn't implement
`embed_batch/2`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
