Arcana.Graph.EntityExtractor.NER (Arcana v1.3.3)

View Source

Extracts named entities from text using Bumblebee NER.

Uses dslim/distilbert-NER to identify persons, organizations, locations, and miscellaneous entities. The model is lazy-loaded on first use to avoid startup overhead when graph features aren't needed.

Usage

# As configured extractor
config :arcana, :graph,
  entity_extractor: :ner

# Direct usage
{:ok, entities} = Arcana.Graph.EntityExtractor.NER.extract(text, [])

Summary

Functions

Extracts entities from text using the NER model.

Extracts entities from multiple texts.

Maps NER labels to entity types.

Functions

extract(text, opts)

Extracts entities from text using the NER model.

Returns a list of entity maps with :name, :type, :span_start, :span_end, :score. Entities are deduplicated by name (first occurrence kept).

Examples

iex> NER.extract("Sam Altman is CEO of OpenAI.", [])
{:ok, [
  %{name: "Sam Altman", type: "person", span_start: 0, span_end: 10, score: 0.99},
  %{name: "OpenAI", type: "organization", span_start: 22, span_end: 28, score: 0.98}
]}

extract_batch(texts, opts)

Extracts entities from multiple texts.

Examples

iex> NER.extract_batch(["Sam Altman", "Elon Musk"], [])
{:ok, [[%{name: "Sam Altman", ...}], [%{name: "Elon Musk", ...}]]}

map_label(label)

Maps NER labels to entity types.

Label Mapping

  • PER, B-PER, I-PER → "person"
  • ORG, B-ORG, I-ORG → "organization"
  • LOC, B-LOC, I-LOC → "location"
  • MISC, B-MISC, I-MISC → "concept"
  • Other → "other"