# `mix text.download_models`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/mix/tasks/text.download_models.ex#L1)

Pre-downloads every external model used by `:text` so that subsequent
calls run without network access.

On-demand downloads work fine for development, but most production
environments want every artefact present at boot. This task fetches:

* `lid.176.bin` — fastText language identification (~126 MB), saved to
  `priv/lid_176/lid.176.bin` inside this project.

* The default Hugging Face model used by `Text.Sentiment.Backends.Bumblebee`
  (XLM-RoBERTa, ~1.1 GB on first download) plus the tokenizer it
  actually loads (`FacebookAI/xlm-roberta-base`).

* The default Hugging Face model used by `Text.POS` (English BERT,
  ~440 MB) plus its tokenizer (`google-bert/bert-base-uncased`).

* The default Hugging Face model used by `Text.NER` (multilingual
  BERT, ~700 MB) plus its tokenizer (`google-bert/bert-base-multilingual-cased`).

Hugging Face artefacts land in Bumblebee's cache directory
(`~/.cache/bumblebee/` by default; override with `BUMBLEBEE_CACHE_DIR`
or `XDG_CACHE_HOME`). Once cached, the corresponding `Text.*` modules
load without any network round-trip.

## Usage

    mix text.download_models                  # download everything
    mix text.download_models --lid176         # just lid.176.bin
    mix text.download_models --sentiment      # just the sentiment stack
    mix text.download_models --pos --ner      # just POS + NER
    mix text.download_models --bumblebee      # all three Bumblebee stacks
    mix text.download_models --force          # re-download even if cached

## Options

* `--lid176` — fetch `lid.176.bin` (or `lid.176.ftz` with `--quantized`).

* `--sentiment` — fetch the default `Text.Sentiment.Backends.Bumblebee`
  model and tokenizer.

* `--pos` — fetch the default `Text.POS` model and tokenizer.

* `--ner` — fetch the default `Text.NER` model and tokenizer.

* `--keybert` — fetch the default `Text.WordCloud.Backends.KeyBERT`
  multilingual sentence-transformer model and tokenizer
  (~470 MB).

* `--bumblebee` — shorthand for `--sentiment --pos --ner --keybert`.

* `--all` — download every model. This is the default when no
  selection flag is given.

* `--force` — re-download `lid.176.bin` even if a cached copy is
  already present. Bumblebee artefacts are cached by etag and
  refresh automatically when the upstream model updates, so this
  flag has no effect on the sentiment, POS, or NER stacks.

* `--quantized` — only meaningful with `--lid176`; downloads the
  `.ftz` quantized variant instead of the full `.bin`.

* `--model <repo>` — override the Hugging Face repo for the single
  selected model. Only valid when exactly one of `--sentiment`,
  `--pos`, or `--ner` is passed; mirrors the `:model` option each
  of those modules accepts.

* `--tokenizer <repo>` — pair with `--model` to override the tokenizer
  repo as well.

### Bumblebee dependency

Downloading the sentiment, POS, or NER models requires the optional
`:bumblebee` dependency to be present in the host application. If
it is missing, those steps are skipped with a warning; the
fastText download still proceeds.

---

*Consult [api-reference.md](api-reference.md) for complete listing*