Arcana.Graph.EntityExtractor behaviour (Arcana v1.3.3)

Behaviour for entity extraction in GraphRAG.

Entity extractors identify named entities (people, organizations, locations, etc.) from text. Arcana provides a built-in NER implementation, but you can implement custom extractors for different approaches.

Built-in Implementations

Arcana.Graph.EntityExtractor.NER - Local Bumblebee NER (default)

Configuration

Configure your entity extractor in config.exs:

# Default: Local NER with distilbert-NER
config :arcana, :graph,
  entity_extractor: :ner

# Custom module implementing this behaviour
config :arcana, :graph,
  entity_extractor: MyApp.LLMEntityExtractor

# Custom module with options
config :arcana, :graph,
  entity_extractor: {MyApp.LLMEntityExtractor, model: "gpt-4"}

# Inline function
config :arcana, :graph,
  entity_extractor: fn text, opts -> {:ok, extract_entities(text)} end

Implementing a Custom Extractor

Create a module that implements this behaviour:

defmodule MyApp.LLMEntityExtractor do
  @behaviour Arcana.Graph.EntityExtractor

  @impl true
  def extract(text, opts) do
    llm = opts[:llm] || raise "LLM required"
    # Use LLM to extract entities...
    {:ok, entities}
  end

  # Optional: implement for batch optimization
  @impl true
  def extract_batch(texts, opts) do
    # Batch LLM call...
    {:ok, results}
  end
end

Entity Format

Extractors must return entities as maps with at least:

:name - The entity name (required)
:type - Entity type as atom: :person, :organization, :location, :concept, :other

Optional fields:

:span_start - Character offset where entity starts
:span_end - Character offset where entity ends
:score - Confidence score (0.0-1.0)
:description - Brief description of the entity

Summary

Callbacks

extract(text, opts)

Extracts entities from a single text.

extract_batch(texts, opts)

Extracts entities from multiple texts in batch.

Functions

extract(fun, text)

Extracts entities using the configured extractor.

extract_batch(fun, texts)

Extracts entities from multiple texts using the configured extractor.

Callbacks

extract(text, opts)

@callback extract(text :: String.t(), opts :: keyword()) ::
  {:ok, [map()]} | {:error, term()}

Extracts entities from a single text.

Parameters

text - The text to extract entities from
opts - Options passed from the extractor configuration

Returns

{:ok, entities} - List of entity maps
{:error, reason} - On failure

extract_batch(texts, opts)

(optional)

@callback extract_batch(texts :: [String.t()], opts :: keyword()) ::
  {:ok, [[map()]]} | {:error, term()}

Extracts entities from multiple texts in batch.

Default implementation calls extract/2 for each text sequentially. Override for extractors that support native batch processing.

Functions

extract(fun, text)

Extracts entities using the configured extractor.

The extractor can be:

A {module, opts} tuple where module implements this behaviour
A function (text, opts) -> {:ok, entities} | {:error, reason}

Examples

# With module
extractor = {Arcana.Graph.EntityExtractor.NER, []}
{:ok, entities} = EntityExtractor.extract(extractor, "Sam Altman leads OpenAI")

# With inline function
extractor = fn text, _opts -> {:ok, [%{name: "Test", type: :other}]} end
{:ok, entities} = EntityExtractor.extract(extractor, "some text")

extract_batch(fun, texts)

Extracts entities from multiple texts using the configured extractor.

Falls back to sequential extraction if the module doesn't implement extract_batch/2.