# `Text.POS`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/pos.ex#L1)

Part-of-speech tagging via [Bumblebee](https://hex.pm/packages/bumblebee).

Exposes `tag/2` which assigns a part-of-speech label
(`:noun`, `:verb`, `:adj`, …) to every word in an input sentence.
Backed by a pre-trained transformer loaded through Bumblebee's
`token_classification/3` serving — the default model is
[`vblagoje/bert-english-uncased-finetuned-pos`](https://huggingface.co/vblagoje/bert-english-uncased-finetuned-pos),
trained on the OntoNotes 5.0 / Penn Treebank tag set.

### Optional dependency

POS tagging requires the `:bumblebee` and (recommended) `:exla`
Hex packages. They are declared as optional dependencies of
`:text`; add them to your application's `mix.exs` to enable this
module:

    {:bumblebee, "~> 0.6"},
    {:exla, "~> 0.9"}

Without `:bumblebee`, calling `tag/2` raises with a clear "add
these to your deps" message.

### Cold start and caching

The first call to `tag/2` downloads the model (~440 MB for the
default English model) from Hugging Face, traces the inference
graph, and compiles it under EXLA. Subsequent calls hit a cached
`Nx.Serving` in `:persistent_term` and run in single-digit
milliseconds.

For production, prefer starting a named serving at boot:

    serving = Bumblebee.Text.token_classification(model_info, tokenizer)
    {:ok, _pid} = Nx.Serving.start_link(serving: serving, name: MyApp.POS)

    Text.POS.tag("the cat sat", serving: MyApp.POS)

### Result shape

`tag/2` returns a list of `{token, tag, score}` triples. The
`tag` is an atom drawn from the model's label set — for the
default English model that's the Penn Treebank-derived
`:noun, :verb, :adj, :adv, :pron, :det, :punct, …`.

### Languages

The default model is English-only. For multilingual POS, supply
a `:model` option pointing to a multilingual checkpoint — for
example, `"QCRI/bert-base-multilingual-cased-pos-english"` for
English, or one of the language-specific BERT POS models on
Hugging Face. The result shape is the same; only the tag
vocabulary changes.

# `tagged_token`

```elixir
@type tagged_token() :: {String.t(), atom(), float()}
```

A single token-and-tag entry in the result list.

# `reset`

```elixir
@spec reset(String.t() | :all) :: :ok
```

Drops the cached `Nx.Serving` for the given model (or all models).

# `tag`

```elixir
@spec tag(
  String.t(),
  keyword()
) :: [tagged_token()]
```

Returns the part-of-speech tags for `text`.

### Arguments

* `text` is a UTF-8 binary.

### Options

* `:model` — the Hugging Face model id to use. Defaults to
  `"vblagoje/bert-english-uncased-finetuned-pos"`. Any sequence-tagging checkpoint
  compatible with `Bumblebee.Text.token_classification/3`
  works.

* `:serving` — pass a name or pid of a pre-started `Nx.Serving`
  to skip the lazy `:persistent_term` cache. Useful in
  production, especially for sharing a single serving across an
  application.

* `:compile` — defn-compilation options for the model. Defaults
  to `[batch_size: 1, sequence_length: 128]`.

### Returns

* A list of `{token, tag, score}` triples.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
