Arcana.Graph.EntityExtractor.NER (Arcana v1.2.0)

View Source

Extracts named entities from text using Bumblebee NER.

Uses dslim/distilbert-NER to identify persons, organizations, locations, and miscellaneous entities. The model is lazy-loaded on first use to avoid startup overhead when graph features aren't needed.

Usage

# As configured extractor
config :arcana, :graph,
  entity_extractor: :ner

# Direct usage
{:ok, entities} = Arcana.Graph.EntityExtractor.NER.extract(text, [])

Summary

Functions

Extracts entities from text using the NER model.

Extracts entities from multiple texts.

Maps NER labels to entity types.

Types

entity()

@type entity() :: %{
  name: String.t(),
  type: String.t(),
  span_start: non_neg_integer(),
  span_end: non_neg_integer(),
  score: float()
}

Functions

extract(text, opts)

@spec extract(
  String.t(),
  keyword()
) :: {:ok, [entity()]} | {:error, term()}

Extracts entities from text using the NER model.

Returns a list of entity maps with :name, :type, :span_start, :span_end, :score. Entities are deduplicated by name (first occurrence kept).

Examples

iex> NER.extract("Sam Altman is CEO of OpenAI.", [])
{:ok, [
  %{name: "Sam Altman", type: "person", span_start: 0, span_end: 10, score: 0.99},
  %{name: "OpenAI", type: "organization", span_start: 22, span_end: 28, score: 0.98}
]}

extract_batch(texts, opts)

@spec extract_batch(
  [String.t()],
  keyword()
) :: {:ok, [[entity()]]}

Extracts entities from multiple texts.

Examples

iex> NER.extract_batch(["Sam Altman", "Elon Musk"], [])
{:ok, [[%{name: "Sam Altman", ...}], [%{name: "Elon Musk", ...}]]}

map_label(label)

@spec map_label(String.t()) :: String.t()

Maps NER labels to entity types.

Label Mapping

  • PER, B-PER, I-PER → "person"
  • ORG, B-ORG, I-ORG → "organization"
  • LOC, B-LOC, I-LOC → "location"
  • MISC, B-MISC, I-MISC → "concept"
  • Other → "other"