Arcana.Graph.EntityExtractor behaviour (Arcana v1.3.3)
View SourceBehaviour for entity extraction in GraphRAG.
Entity extractors identify named entities (people, organizations, locations, etc.) from text. Arcana provides a built-in NER implementation, but you can implement custom extractors for different approaches.
Built-in Implementations
Arcana.Graph.EntityExtractor.NER- Local Bumblebee NER (default)
Configuration
Configure your entity extractor in config.exs:
# Default: Local NER with distilbert-NER
config :arcana, :graph,
entity_extractor: :ner
# Custom module implementing this behaviour
config :arcana, :graph,
entity_extractor: MyApp.LLMEntityExtractor
# Custom module with options
config :arcana, :graph,
entity_extractor: {MyApp.LLMEntityExtractor, model: "gpt-4"}
# Inline function
config :arcana, :graph,
entity_extractor: fn text, opts -> {:ok, extract_entities(text)} endImplementing a Custom Extractor
Create a module that implements this behaviour:
defmodule MyApp.LLMEntityExtractor do
@behaviour Arcana.Graph.EntityExtractor
@impl true
def extract(text, opts) do
llm = opts[:llm] || raise "LLM required"
# Use LLM to extract entities...
{:ok, entities}
end
# Optional: implement for batch optimization
@impl true
def extract_batch(texts, opts) do
# Batch LLM call...
{:ok, results}
end
endEntity Format
Extractors must return entities as maps with at least:
:name- The entity name (required):type- Entity type as atom::person,:organization,:location,:concept,:other
Optional fields:
:span_start- Character offset where entity starts:span_end- Character offset where entity ends:score- Confidence score (0.0-1.0):description- Brief description of the entity
Summary
Functions
Extracts entities using the configured extractor.
Extracts entities from multiple texts using the configured extractor.
Callbacks
Extracts entities from a single text.
Parameters
text- The text to extract entities fromopts- Options passed from the extractor configuration
Returns
{:ok, entities}- List of entity maps{:error, reason}- On failure
@callback extract_batch(texts :: [String.t()], opts :: keyword()) :: {:ok, [[map()]]} | {:error, term()}
Extracts entities from multiple texts in batch.
Default implementation calls extract/2 for each text sequentially.
Override for extractors that support native batch processing.
Functions
Extracts entities using the configured extractor.
The extractor can be:
- A
{module, opts}tuple where module implements this behaviour A function
(text, opts) -> {:ok, entities} | {:error, reason}
Examples
# With module
extractor = {Arcana.Graph.EntityExtractor.NER, []}
{:ok, entities} = EntityExtractor.extract(extractor, "Sam Altman leads OpenAI")
# With inline function
extractor = fn text, _opts -> {:ok, [%{name: "Test", type: :other}]} end
{:ok, entities} = EntityExtractor.extract(extractor, "some text")
Extracts entities from multiple texts using the configured extractor.
Falls back to sequential extraction if the module doesn't implement
extract_batch/2.