# `Arcana.Graph.EntityExtractor`
[🔗](https://github.com/georgeguimaraes/arcana/blob/main/lib/arcana/graph/entity_extractor.ex#L1)

Behaviour for entity extraction in GraphRAG.

Entity extractors identify named entities (people, organizations, locations, etc.)
from text. Arcana provides a built-in NER implementation, but you can implement
custom extractors for different approaches.

## Built-in Implementations

- `Arcana.Graph.EntityExtractor.NER` - Local Bumblebee NER (default)

## Configuration

Configure your entity extractor in `config.exs`:

    # Default: Local NER with distilbert-NER
    config :arcana, :graph,
      entity_extractor: :ner

    # Custom module implementing this behaviour
    config :arcana, :graph,
      entity_extractor: MyApp.LLMEntityExtractor

    # Custom module with options
    config :arcana, :graph,
      entity_extractor: {MyApp.LLMEntityExtractor, model: "gpt-4"}

    # Inline function
    config :arcana, :graph,
      entity_extractor: fn text, opts -> {:ok, extract_entities(text)} end

## Implementing a Custom Extractor

Create a module that implements this behaviour:

    defmodule MyApp.LLMEntityExtractor do
      @behaviour Arcana.Graph.EntityExtractor

      @impl true
      def extract(text, opts) do
        llm = opts[:llm] || raise "LLM required"
        # Use LLM to extract entities...
        {:ok, entities}
      end

      # Optional: implement for batch optimization
      @impl true
      def extract_batch(texts, opts) do
        # Batch LLM call...
        {:ok, results}
      end
    end

## Entity Format

Extractors must return entities as maps with at least:

- `:name` - The entity name (required)
- `:type` - Entity type as atom: `:person`, `:organization`, `:location`, `:concept`, `:other`

Optional fields:

- `:span_start` - Character offset where entity starts
- `:span_end` - Character offset where entity ends
- `:score` - Confidence score (0.0-1.0)
- `:description` - Brief description of the entity

# `extract`

```elixir
@callback extract(text :: String.t(), opts :: keyword()) ::
  {:ok, [map()]} | {:error, term()}
```

Extracts entities from a single text.

## Parameters

- `text` - The text to extract entities from
- `opts` - Options passed from the extractor configuration

## Returns

- `{:ok, entities}` - List of entity maps
- `{:error, reason}` - On failure

# `extract_batch`
*optional* 

```elixir
@callback extract_batch(texts :: [String.t()], opts :: keyword()) ::
  {:ok, [[map()]]} | {:error, term()}
```

Extracts entities from multiple texts in batch.

Default implementation calls `extract/2` for each text sequentially.
Override for extractors that support native batch processing.

# `extract`

Extracts entities using the configured extractor.

The extractor can be:
- A `{module, opts}` tuple where module implements this behaviour
- A function `(text, opts) -> {:ok, entities} | {:error, reason}`

## Examples

    # With module
    extractor = {Arcana.Graph.EntityExtractor.NER, []}
    {:ok, entities} = EntityExtractor.extract(extractor, "Sam Altman leads OpenAI")

    # With inline function
    extractor = fn text, _opts -> {:ok, [%{name: "Test", type: :other}]} end
    {:ok, entities} = EntityExtractor.extract(extractor, "some text")

# `extract_batch`

Extracts entities from multiple texts using the configured extractor.

Falls back to sequential extraction if the module doesn't implement
`extract_batch/2`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*