mix text.gen_stopwords (Text v0.5.0)

Copy Markdown View Source

Fetches the stopwords-iso bundle (a single JSON file mapping ISO 639-1 codes to lists of stopwords) and writes one plain-text file per language under priv/stopwords/.

The text files are checked into the repo and read by Text.Stopwords at compile time via @external_resource. Run this task only when the upstream list is updated, or when adding support for new languages.

Usage

mix text.gen_stopwords

Options

  • --source — URL to fetch the JSON from. Defaults to the upstream stopwords-iso.json on the master branch of stopwords-iso/stopwords-iso.

Source format: {"<iso 639-1>": ["word1", "word2", ...], ...}. License of the source data is MIT, which permits redistribution; see priv/stopwords/LICENSE for the upstream attribution.