Neural sentiment backend backed by Bumblebee.
Loads a pre-trained multilingual sentiment classifier from Hugging
Face and serves predictions via Nx.Serving. The default model is
cardiffnlp/twitter-xlm-roberta-base-sentiment,
an XLM-RoBERTa model fine-tuned on multilingual Twitter data
(English, French, German, Italian, Spanish, Portuguese, Arabic, and
Hindi by training, with reasonable transfer to many more languages
via the underlying multilingual base).
This backend is opt-in:
The
:bumblebeeand:exlaHex packages must be added as dependencies of the host application.Either pass
backend: Text.Sentiment.Backends.BumblebeetoText.Sentiment.analyze/2, or set the application config:config :text, :sentiment_backend, Text.Sentiment.Backends.Bumblebee
Cold start
The first call to analyze/2 downloads the model weights (~280 MB
on disk) from Hugging Face, traces the inference graph through
Bumblebee, and compiles it under EXLA (or Nx.Defn.Evaluator if
EXLA is not loaded). Expect a 10–30 second cold start. Subsequent
calls in the same VM run in single-digit milliseconds — the compiled
serving is cached in :persistent_term.
For production deployments where cold start is unacceptable, start
a named serving process at boot via
Bumblebee.Text.text_classification/3 + Nx.Serving.start_link/1,
then pass serving: <name_or_pid> to analyze/2 to skip the cache
entirely.
Tokenizer override
Some fine-tuned models on Hugging Face ship without the
tokenizer.json Bumblebee expects — they have only the raw
SentencePiece or WordPiece data. The default Cardiff sentiment
model is one of those, so this backend loads its tokenizer from
the base FacebookAI/xlm-roberta-base repo instead. Other models
fall through to "use the same repo as the model" by default.
If you point :model at a fine-tune that itself lacks a
tokenizer.json, pass :tokenizer_repo to point at a repo that
has one (typically the base model the fine-tune was trained on).
Result shape
Returns the same map shape every backend produces:
:label—:positive,:negative, or:neutral(mapped from the model's textual labels).:compound—P(positive) − P(negative), in[-1.0, +1.0].:scores— the full per-label probability map, e.g.%{positive: 0.86, neutral: 0.10, negative: 0.04}.:backend—Text.Sentiment.Backends.Bumblebee.:model— the model id used for the prediction.
Unlike the lexicon backend, no :matched/:tokens counts are
included — the underlying model is opaque about which tokens it used.
Language
XLM-RoBERTa is intrinsically multilingual; no per-language routing
is required. The :language option is accepted (for API consistency
with the lexicon backend) but ignored by this backend; the language
used at scoring time is reported in the result map for round-tripping.
Summary
Functions
Drops the cached Nx.Serving for the given model (or all models).
Functions
@spec reset(String.t() | :all) :: :ok
Drops the cached Nx.Serving for the given model (or all models).
Useful in tests, or when switching defn-options at runtime.
Arguments
model— a model id string. Defaults to the package default (cardiffnlp/twitter-xlm-roberta-base-sentiment). Pass:allto drop every cached serving.
Returns
:ok.