Named-entity recognition via Bumblebee.
Identifies named entities (people, organisations, locations, …) in
natural-language text and returns each as a span with type and
confidence. Backed by Bumblebee's
Bumblebee.Text.token_classification/3 serving with span
aggregation, loading a pre-trained transformer from Hugging Face.
Default model
The default is
Davlan/bert-base-multilingual-cased-ner-hrl,
trained on 10 high-resource languages (Arabic, German, English,
Spanish, French, Italian, Latvian, Dutch, Portuguese, Chinese)
with the standard four-class CoNLL-2003 tag set:
:per— person:org— organisation:loc— location:misc— miscellaneous
Override with :model to use a language-specific or
domain-specific NER model.
Optional dependency
Like Text.POS, NER requires the optional :bumblebee and
:exla Hex packages. Without :bumblebee, extract/2 raises
with a helpful message.
Cold start and caching
First call downloads the model (~700 MB for the multilingual
default) and traces the inference graph. Subsequent calls hit a
:persistent_term-cached serving. For production, start a named
serving at boot:
serving = Bumblebee.Text.token_classification(model_info, tokenizer,
aggregation: :same)
{:ok, _pid} = Nx.Serving.start_link(serving: serving, name: MyApp.NER)
Text.NER.extract(text, serving: MyApp.NER)Result shape
%Text.NER.Entity{
text: "Barack Obama",
type: :per,
start: 0,
end: 12,
score: 0.99
}Spans are byte offsets into the original input.
Summary
Functions
Extracts named entities from text.
Drops the cached Nx.Serving for the given model (or all models).
Functions
@spec extract( String.t(), keyword() ) :: [Text.NER.Entity.t()]
Extracts named entities from text.
Arguments
textis a UTF-8 binary.
Options
:model— the Hugging Face model id. Defaults to the multilingual"Davlan/bert-base-multilingual-cased-ner-hrl".:serving— pass a name or pid of a pre-startedNx.Servingto skip the lazy cache.:compile— defn-compilation options. Defaults to[batch_size: 1, sequence_length: 256].:min_score— filter out entities below this confidence. Defaults to0.0(keep everything).
Returns
- A list of
Text.NER.Entitystructs in document order.
@spec reset(String.t() | :all) :: :ok
Drops the cached Nx.Serving for the given model (or all models).