# `Text.WordCloud.Backends.KeyBERT`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/word_cloud/backends/key_bert.ex#L1)

Neural keyword-extraction backend backed by
[Bumblebee](https://hex.pm/packages/bumblebee).

Implements [KeyBERT](https://github.com/MaartenGr/KeyBERT)-style
scoring: embed the input document and each candidate phrase with a
multilingual sentence-transformer, then rank candidates by cosine
similarity to the document embedding. The intuition is that the
best keyword candidates are the phrases whose meaning is closest
to the document as a whole — exactly what neural sentence
embeddings capture.

This backend is **opt-in**:

* The `:bumblebee` and (recommended) `:exla` Hex packages must be
  declared as dependencies of the host application.

* Either pass `scoring: :key_bert` to `Text.WordCloud.terms/2`,
  or use this module directly.

### Cold start

The first call downloads the default model (~470 MB —
[`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2),
multilingual across ~50 languages) from Hugging Face, traces the
inference graph, and compiles it under EXLA. Subsequent calls hit a
cached `Nx.Serving` in `:persistent_term`. Pre-download via
`mix text.download_models --keybert` if your production environment
needs everything present at boot.

### When to use this backend

KeyBERT typically produces the highest-quality output of any
backend in this package — at the cost of a model download, GPU/EXLA
compilation, and substantially higher per-call latency than YAKE!
Use this when quality matters more than throughput, or when YAKE!'s
statistical features struggle with very short or very domain-specific
text.

### Options

* `:model` — Hugging Face model id. Defaults to
  `"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"`.
  Any sentence-transformer model compatible with
  `Bumblebee.Text.text_embedding/3` works.

* `:tokenizer_repo` — overrides the tokenizer source repo (rarely
  needed for sentence-transformer models, which ship complete
  tokenizers).

* `:serving` — name or pid of a pre-started `Nx.Serving` to skip
  the lazy `:persistent_term` cache. Recommended for production.

* `:candidate_pool_size` — cap on the number of candidates
  embedded; large documents can produce hundreds of phrases and
  embedding all of them is wasteful. Defaults to `200`. Candidates
  are pre-filtered by raw frequency before embedding.

* `:ngram_range` — `{min, max}` candidate length. Defaults to
  `{1, 3}`.

Standard `Text.WordCloud` orchestrator options (`:language`,
`:stopwords`, `:case_fold`, `:locale`) are honoured.

# `reset`

```elixir
@spec reset(String.t() | :all) :: :ok
```

Drops the cached `Nx.Serving` for the given KeyBERT model.

### Arguments

* `model` — a model id string. Defaults to the package default.
  Pass `:all` to drop every cached serving for this backend.

### Returns

* `:ok`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*