Arcana.Embedder behaviour (Arcana v1.3.3)

View Source

Behaviour for embedding providers used by Arcana.

Arcana accepts any module that implements this behaviour. Built-in implementations are provided for:

Configuration

Configure your embedding provider in config.exs:

# Default: Local Bumblebee with bge-small-en-v1.5 (384 dims)
config :arcana, embedder: :local

# Local with different HuggingFace model
config :arcana, embedder: {:local, model: "BAAI/bge-large-en-v1.5"}

# OpenAI via Req.LLM
config :arcana, embedder: :openai
config :arcana, embedder: {:openai, model: "text-embedding-3-large"}

# Custom function
config :arcana, embedder: fn text -> {:ok, embedding} end

# Custom module implementing this behaviour
config :arcana, embedder: MyApp.CohereEmbedder
config :arcana, embedder: {MyApp.CohereEmbedder, api_key: "..."}

Implementing a Custom Embedder

Create a module that implements this behaviour:

defmodule MyApp.CohereEmbedder do
  @behaviour Arcana.Embedder

  @impl true
  def embed(text, opts) do
    api_key = opts[:api_key] || System.get_env("COHERE_API_KEY")
    # Call Cohere API...
    {:ok, embedding}
  end

  @impl true
  def dimensions(_opts), do: 1024
end

Then configure:

config :arcana, embedder: {MyApp.CohereEmbedder, api_key: "..."}

Summary

Callbacks

Returns the embedding dimensions.

Embed a single text string.

Embed multiple texts in batch.

Functions

Returns the embedding dimensions for the configured embedder.

Embeds text using the configured embedder.

Embeds multiple texts using the configured embedder.

Callbacks

dimensions(opts)

@callback dimensions(opts :: keyword()) :: pos_integer()

Returns the embedding dimensions.

embed(text, opts)

@callback embed(text :: String.t(), opts :: keyword()) ::
  {:ok, [float()]} | {:error, term()}

Embed a single text string.

Returns {:ok, embedding} where embedding is a list of floats, or {:error, reason} on failure.

embed_batch(texts, opts)

(optional)
@callback embed_batch(texts :: [String.t()], opts :: keyword()) ::
  {:ok, [[float()]]} | {:error, term()}

Embed multiple texts in batch.

Default implementation calls embed/2 for each text sequentially. Override for providers that support native batch embedding.

Functions

dimensions(arg)

Returns the embedding dimensions for the configured embedder.

embed(arg, text, call_opts \\ [])

Embeds text using the configured embedder.

The embedder is a {module, opts} tuple where module implements this behaviour.

Options

  • :intent - The embedding intent, either :query or :document. Used by models like E5 that require different prefixes for search queries vs document content. Defaults to :document.

Examples

# Embed a search query (uses "query: " prefix for E5 models)
Embedder.embed(embedder, "what is machine learning?", intent: :query)

# Embed document content (uses "passage: " prefix for E5 models)
Embedder.embed(embedder, "Machine learning is...", intent: :document)

embed_batch(arg, texts)

Embeds multiple texts using the configured embedder.

Falls back to sequential embedding if the module doesn't implement embed_batch/2.