# `Text.NER`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/ner.ex#L1)

Named-entity recognition via [Bumblebee](https://hex.pm/packages/bumblebee).

Identifies named entities (people, organisations, locations, …) in
natural-language text and returns each as a span with type and
confidence. Backed by Bumblebee's
`Bumblebee.Text.token_classification/3` serving with span
aggregation, loading a pre-trained transformer from Hugging Face.

### Default model

The default is
[`Davlan/bert-base-multilingual-cased-ner-hrl`](https://huggingface.co/Davlan/bert-base-multilingual-cased-ner-hrl),
trained on 10 high-resource languages (Arabic, German, English,
Spanish, French, Italian, Latvian, Dutch, Portuguese, Chinese)
with the standard four-class CoNLL-2003 tag set:

* `:per` — person
* `:org` — organisation
* `:loc` — location
* `:misc` — miscellaneous

Override with `:model` to use a language-specific or
domain-specific NER model.

### Optional dependency

Like `Text.POS`, NER requires the optional `:bumblebee` and
`:exla` Hex packages. Without `:bumblebee`, `extract/2` raises
with a helpful message.

### Cold start and caching

First call downloads the model (~700 MB for the multilingual
default) and traces the inference graph. Subsequent calls hit a
`:persistent_term`-cached serving. For production, start a named
serving at boot:

    serving = Bumblebee.Text.token_classification(model_info, tokenizer,
                aggregation: :same)
    {:ok, _pid} = Nx.Serving.start_link(serving: serving, name: MyApp.NER)

    Text.NER.extract(text, serving: MyApp.NER)

### Result shape

    %Text.NER.Entity{
      text: "Barack Obama",
      type: :per,
      start: 0,
      end: 12,
      score: 0.99
    }

Spans are byte offsets into the original input.

# `extract`

```elixir
@spec extract(
  String.t(),
  keyword()
) :: [Text.NER.Entity.t()]
```

Extracts named entities from `text`.

### Arguments

* `text` is a UTF-8 binary.

### Options

* `:model` — the Hugging Face model id. Defaults to the
  multilingual `"Davlan/bert-base-multilingual-cased-ner-hrl"`.

* `:serving` — pass a name or pid of a pre-started `Nx.Serving`
  to skip the lazy cache.

* `:compile` — defn-compilation options. Defaults to
  `[batch_size: 1, sequence_length: 256]`.

* `:min_score` — filter out entities below this confidence.
  Defaults to `0.0` (keep everything).

### Returns

* A list of `Text.NER.Entity` structs in document order.

# `reset`

```elixir
@spec reset(String.t() | :all) :: :ok
```

Drops the cached `Nx.Serving` for the given model (or all models).

---

*Consult [api-reference.md](api-reference.md) for complete listing*
