Text.NER (Text v0.5.0)

Copy Markdown View Source

Named-entity recognition via Bumblebee.

Identifies named entities (people, organisations, locations, …) in natural-language text and returns each as a span with type and confidence. Backed by Bumblebee's Bumblebee.Text.token_classification/3 serving with span aggregation, loading a pre-trained transformer from Hugging Face.

Default model

The default is Davlan/bert-base-multilingual-cased-ner-hrl, trained on 10 high-resource languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese, Chinese) with the standard four-class CoNLL-2003 tag set:

  • :per — person
  • :org — organisation
  • :loc — location
  • :misc — miscellaneous

Override with :model to use a language-specific or domain-specific NER model.

Optional dependency

Like Text.POS, NER requires the optional :bumblebee and :exla Hex packages. Without :bumblebee, extract/2 raises with a helpful message.

Cold start and caching

First call downloads the model (~700 MB for the multilingual default) and traces the inference graph. Subsequent calls hit a :persistent_term-cached serving. For production, start a named serving at boot:

serving = Bumblebee.Text.token_classification(model_info, tokenizer,
            aggregation: :same)
{:ok, _pid} = Nx.Serving.start_link(serving: serving, name: MyApp.NER)

Text.NER.extract(text, serving: MyApp.NER)

Result shape

%Text.NER.Entity{
  text: "Barack Obama",
  type: :per,
  start: 0,
  end: 12,
  score: 0.99
}

Spans are byte offsets into the original input.

Summary

Functions

Extracts named entities from text.

Drops the cached Nx.Serving for the given model (or all models).

Functions

extract(text, options \\ [])

@spec extract(
  String.t(),
  keyword()
) :: [Text.NER.Entity.t()]

Extracts named entities from text.

Arguments

  • text is a UTF-8 binary.

Options

  • :model — the Hugging Face model id. Defaults to the multilingual "Davlan/bert-base-multilingual-cased-ner-hrl".

  • :serving — pass a name or pid of a pre-started Nx.Serving to skip the lazy cache.

  • :compile — defn-compilation options. Defaults to [batch_size: 1, sequence_length: 256].

  • :min_score — filter out entities below this confidence. Defaults to 0.0 (keep everything).

Returns

reset(model \\ "Davlan/bert-base-multilingual-cased-ner-hrl")

@spec reset(String.t() | :all) :: :ok

Drops the cached Nx.Serving for the given model (or all models).