# `Text.Sentiment.Backends.Bumblebee`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/sentiment/backends/bumblebee.ex#L1)

Neural sentiment backend backed by
[Bumblebee](https://hex.pm/packages/bumblebee).

Loads a pre-trained multilingual sentiment classifier from Hugging
Face and serves predictions via `Nx.Serving`. The default model is
[`cardiffnlp/twitter-xlm-roberta-base-sentiment`](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment),
an XLM-RoBERTa model fine-tuned on multilingual Twitter data
(English, French, German, Italian, Spanish, Portuguese, Arabic, and
Hindi by training, with reasonable transfer to many more languages
via the underlying multilingual base).

This backend is **opt-in**:

* The `:bumblebee` and `:exla` Hex packages must be added as
  dependencies of the host application.

* Either pass `backend: Text.Sentiment.Backends.Bumblebee` to
  `Text.Sentiment.analyze/2`, or set the application config:

      config :text, :sentiment_backend, Text.Sentiment.Backends.Bumblebee

### Cold start

The first call to `analyze/2` downloads the model weights (~280 MB
on disk) from Hugging Face, traces the inference graph through
Bumblebee, and compiles it under EXLA (or `Nx.Defn.Evaluator` if
EXLA is not loaded). Expect a 10–30 second cold start. Subsequent
calls in the same VM run in single-digit milliseconds — the compiled
serving is cached in `:persistent_term`.

For production deployments where cold start is unacceptable, start
a named serving process at boot via
`Bumblebee.Text.text_classification/3` + `Nx.Serving.start_link/1`,
then pass `serving: <name_or_pid>` to `analyze/2` to skip the cache
entirely.

### Tokenizer override

Some fine-tuned models on Hugging Face ship without the
`tokenizer.json` Bumblebee expects — they have only the raw
SentencePiece or WordPiece data. The default Cardiff sentiment
model is one of those, so this backend loads its tokenizer from
the base `FacebookAI/xlm-roberta-base` repo instead. Other models
fall through to "use the same repo as the model" by default.

If you point `:model` at a fine-tune that itself lacks a
`tokenizer.json`, pass `:tokenizer_repo` to point at a repo that
has one (typically the base model the fine-tune was trained on).

### Result shape

Returns the same map shape every backend produces:

* `:label` — `:positive`, `:negative`, or `:neutral` (mapped from
  the model's textual labels).

* `:compound` — `P(positive) − P(negative)`, in `[-1.0, +1.0]`.

* `:scores` — the full per-label probability map, e.g.
  `%{positive: 0.86, neutral: 0.10, negative: 0.04}`.

* `:backend` — `Text.Sentiment.Backends.Bumblebee`.

* `:model` — the model id used for the prediction.

Unlike the lexicon backend, no `:matched`/`:tokens` counts are
included — the underlying model is opaque about which tokens it used.

### Language

XLM-RoBERTa is intrinsically multilingual; no per-language routing
is required. The `:language` option is accepted (for API consistency
with the lexicon backend) but ignored by this backend; the language
used at scoring time is reported in the result map for round-tripping.

# `reset`

```elixir
@spec reset(String.t() | :all) :: :ok
```

Drops the cached `Nx.Serving` for the given model (or all models).

Useful in tests, or when switching defn-options at runtime.

### Arguments

* `model` — a model id string. Defaults to the package default
  (`cardiffnlp/twitter-xlm-roberta-base-sentiment`). Pass `:all` to drop every cached
  serving.

### Returns

* `:ok`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
