Arcana.Graph.EntityExtractor.NER (Arcana v1.2.0)
View SourceExtracts named entities from text using Bumblebee NER.
Uses dslim/distilbert-NER to identify persons, organizations, locations, and miscellaneous entities. The model is lazy-loaded on first use to avoid startup overhead when graph features aren't needed.
Usage
# As configured extractor
config :arcana, :graph,
entity_extractor: :ner
# Direct usage
{:ok, entities} = Arcana.Graph.EntityExtractor.NER.extract(text, [])
Summary
Functions
Extracts entities from text using the NER model.
Extracts entities from multiple texts.
Maps NER labels to entity types.
Types
@type entity() :: %{ name: String.t(), type: String.t(), span_start: non_neg_integer(), span_end: non_neg_integer(), score: float() }
Functions
Extracts entities from text using the NER model.
Returns a list of entity maps with :name, :type, :span_start, :span_end, :score. Entities are deduplicated by name (first occurrence kept).
Examples
iex> NER.extract("Sam Altman is CEO of OpenAI.", [])
{:ok, [
%{name: "Sam Altman", type: "person", span_start: 0, span_end: 10, score: 0.99},
%{name: "OpenAI", type: "organization", span_start: 22, span_end: 28, score: 0.98}
]}
Extracts entities from multiple texts.
Examples
iex> NER.extract_batch(["Sam Altman", "Elon Musk"], [])
{:ok, [[%{name: "Sam Altman", ...}], [%{name: "Elon Musk", ...}]]}
Maps NER labels to entity types.
Label Mapping
- PER, B-PER, I-PER → "person"
- ORG, B-ORG, I-ORG → "organization"
- LOC, B-LOC, I-LOC → "location"
- MISC, B-MISC, I-MISC → "concept"
- Other → "other"