Text.POS (Text v0.5.0)

Part-of-speech tagging via Bumblebee.

Exposes tag/2 which assigns a part-of-speech label (:noun, :verb, :adj, …) to every word in an input sentence. Backed by a pre-trained transformer loaded through Bumblebee's token_classification/3 serving — the default model is vblagoje/bert-english-uncased-finetuned-pos, trained on the OntoNotes 5.0 / Penn Treebank tag set.

Optional dependency

POS tagging requires the :bumblebee and (recommended) :exla Hex packages. They are declared as optional dependencies of :text; add them to your application's mix.exs to enable this module:

{:bumblebee, "~> 0.6"},
{:exla, "~> 0.9"}

Without :bumblebee, calling tag/2 raises with a clear "add these to your deps" message.

Cold start and caching

The first call to tag/2 downloads the model (~440 MB for the default English model) from Hugging Face, traces the inference graph, and compiles it under EXLA. Subsequent calls hit a cached Nx.Serving in :persistent_term and run in single-digit milliseconds.

For production, prefer starting a named serving at boot:

serving = Bumblebee.Text.token_classification(model_info, tokenizer)
{:ok, _pid} = Nx.Serving.start_link(serving: serving, name: MyApp.POS)

Text.POS.tag("the cat sat", serving: MyApp.POS)

Result shape

tag/2 returns a list of {token, tag, score} triples. The tag is an atom drawn from the model's label set — for the default English model that's the Penn Treebank-derived :noun, :verb, :adj, :adv, :pron, :det, :punct, ….

Languages

The default model is English-only. For multilingual POS, supply a :model option pointing to a multilingual checkpoint — for example, "QCRI/bert-base-multilingual-cased-pos-english" for English, or one of the language-specific BERT POS models on Hugging Face. The result shape is the same; only the tag vocabulary changes.

Summary

Types

tagged_token()

A single token-and-tag entry in the result list.

Functions

reset(model \\ "vblagoje/bert-english-uncased-finetuned-pos")

Drops the cached Nx.Serving for the given model (or all models).

tag(text, options \\ [])

Returns the part-of-speech tags for text.