Pre-downloads every external model used by :text so that subsequent
calls run without network access.
On-demand downloads work fine for development, but most production environments want every artefact present at boot. This task fetches:
lid.176.bin— fastText language identification (~126 MB), saved topriv/lid_176/lid.176.bininside this project.The default Hugging Face model used by
Text.Sentiment.Backends.Bumblebee(XLM-RoBERTa, ~1.1 GB on first download) plus the tokenizer it actually loads (FacebookAI/xlm-roberta-base).The default Hugging Face model used by
Text.POS(English BERT, ~440 MB) plus its tokenizer (google-bert/bert-base-uncased).The default Hugging Face model used by
Text.NER(multilingual BERT, ~700 MB) plus its tokenizer (google-bert/bert-base-multilingual-cased).
Hugging Face artefacts land in Bumblebee's cache directory
(~/.cache/bumblebee/ by default; override with BUMBLEBEE_CACHE_DIR
or XDG_CACHE_HOME). Once cached, the corresponding Text.* modules
load without any network round-trip.
Usage
mix text.download_models # download everything
mix text.download_models --lid176 # just lid.176.bin
mix text.download_models --sentiment # just the sentiment stack
mix text.download_models --pos --ner # just POS + NER
mix text.download_models --bumblebee # all three Bumblebee stacks
mix text.download_models --force # re-download even if cachedOptions
--lid176— fetchlid.176.bin(orlid.176.ftzwith--quantized).--sentiment— fetch the defaultText.Sentiment.Backends.Bumblebeemodel and tokenizer.--pos— fetch the defaultText.POSmodel and tokenizer.--ner— fetch the defaultText.NERmodel and tokenizer.--keybert— fetch the defaultText.WordCloud.Backends.KeyBERTmultilingual sentence-transformer model and tokenizer (~470 MB).--bumblebee— shorthand for--sentiment --pos --ner --keybert.--all— download every model. This is the default when no selection flag is given.--force— re-downloadlid.176.bineven if a cached copy is already present. Bumblebee artefacts are cached by etag and refresh automatically when the upstream model updates, so this flag has no effect on the sentiment, POS, or NER stacks.--quantized— only meaningful with--lid176; downloads the.ftzquantized variant instead of the full.bin.--model <repo>— override the Hugging Face repo for the single selected model. Only valid when exactly one of--sentiment,--pos, or--neris passed; mirrors the:modeloption each of those modules accepts.--tokenizer <repo>— pair with--modelto override the tokenizer repo as well.
Bumblebee dependency
Downloading the sentiment, POS, or NER models requires the optional
:bumblebee dependency to be present in the host application. If
it is missing, those steps are skipped with a warning; the
fastText download still proceeds.