Neural keyword-extraction backend backed by Bumblebee.
Implements KeyBERT-style scoring: embed the input document and each candidate phrase with a multilingual sentence-transformer, then rank candidates by cosine similarity to the document embedding. The intuition is that the best keyword candidates are the phrases whose meaning is closest to the document as a whole — exactly what neural sentence embeddings capture.
This backend is opt-in:
The
:bumblebeeand (recommended):exlaHex packages must be declared as dependencies of the host application.Either pass
scoring: :key_berttoText.WordCloud.terms/2, or use this module directly.
Cold start
The first call downloads the default model (~470 MB —
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2,
multilingual across ~50 languages) from Hugging Face, traces the
inference graph, and compiles it under EXLA. Subsequent calls hit a
cached Nx.Serving in :persistent_term. Pre-download via
mix text.download_models --keybert if your production environment
needs everything present at boot.
When to use this backend
KeyBERT typically produces the highest-quality output of any backend in this package — at the cost of a model download, GPU/EXLA compilation, and substantially higher per-call latency than YAKE! Use this when quality matters more than throughput, or when YAKE!'s statistical features struggle with very short or very domain-specific text.
Options
:model— Hugging Face model id. Defaults to"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2". Any sentence-transformer model compatible withBumblebee.Text.text_embedding/3works.:tokenizer_repo— overrides the tokenizer source repo (rarely needed for sentence-transformer models, which ship complete tokenizers).:serving— name or pid of a pre-startedNx.Servingto skip the lazy:persistent_termcache. Recommended for production.:candidate_pool_size— cap on the number of candidates embedded; large documents can produce hundreds of phrases and embedding all of them is wasteful. Defaults to200. Candidates are pre-filtered by raw frequency before embedding.:ngram_range—{min, max}candidate length. Defaults to{1, 3}.
Standard Text.WordCloud orchestrator options (:language,
:stopwords, :case_fold, :locale) are honoured.
Summary
Functions
Drops the cached Nx.Serving for the given KeyBERT model.
Functions
@spec reset(String.t() | :all) :: :ok
Drops the cached Nx.Serving for the given KeyBERT model.
Arguments
model— a model id string. Defaults to the package default. Pass:allto drop every cached serving for this backend.
Returns
:ok.