Locates runtime data files used by Text modules, fetching them from upstream sources when permitted.
Most modules in this package ship a small bundled dataset (e.g.
Text.Hyphenation ships American English patterns, Text.Lemma
ships English lemmatization data) but also accept additional
language packs at runtime. This module is the single
configuration surface those modules share for finding, caching,
and optionally downloading those packs.
Configuration
config :text,
# Where on-demand-downloaded data is cached.
# Default: "~/.cache/text".
data_dir: "~/.cache/text",
# Per-domain auto-download permissions. All default to false:
# the package never reaches out to the network without an
# explicit opt-in for the relevant domain.
auto_download_hyphenation_data: true,
auto_download_lemma_data: true,
auto_download_wordfreq_data: falseLookup order
When a Text module asks for a data file via fetch/3:
The configured
:data_diris checked first. Once a file has been cached there (whether downloaded by this package, copied manually, or shipped by a deployment script) it is reused on every subsequent call.The bundled
priv/<domain>/directory inside the:textpackage is checked next.If a download URL is known and
auto_download_<domain>_dataistrue, the file is downloaded to the cache directory and the cached path is returned.Otherwise
{:error, %ArgumentError{}}is returned with a message describing how to resolve the situation (enable auto-download, or place the file manually).
Domains
Each Text module that needs runtime data uses its own domain
atom — :hyphenation, :lemma, :wordfreq. Domains map
directly to subdirectory names under both priv/ and :data_dir,
and to per-domain permission keys.
Summary
Functions
Returns true when auto-download is enabled for the named domain.
Returns the bundled path where a file would live (whether it does or not).
Returns the cache path where a file would live (whether it does or not).
Returns the absolute path to the configured data directory.
Locates a data file for a domain, downloading it on demand when permitted.
Functions
Returns true when auto-download is enabled for the named domain.
Arguments
domainis an atom like:hyphenation. Looks up the application env key:auto_download_<domain>_data.
Returns
- A boolean. Defaults to
falseif the key is not set.
Returns the bundled path where a file would live (whether it does or not).
Returns the cache path where a file would live (whether it does or not).
Useful for tooling that wants to inspect or warm the cache.
@spec data_dir() :: String.t()
Returns the absolute path to the configured data directory.
Returns
- The expanded path. The directory is not created until the first download writes to it.
Examples
iex> is_binary(Text.Data.data_dir())
true
@spec fetch(atom(), String.t(), keyword()) :: {:ok, String.t()} | {:error, Exception.t()}
Locates a data file for a domain, downloading it on demand when permitted.
Arguments
domainis an atom naming the data category (e.g.:hyphenation). Used as the subdirectory name in both:data_dirandpriv/, and as the prefix of the auto-download permission key.filenameis the name of the file to locate (e.g."hyph-de-1996.tex").
Options
:urlis the upstream URL to download from when the file is not already cached. Required for auto-download to work; without it,fetch/3only consults the cache and bundled directories.
Returns
{:ok, path}— the file is available atpath.{:error, %ArgumentError{}}— the file could not be located and either no URL was provided or auto-download is disabled for the domain. The error message explains how to resolve.
Examples
iex> {:ok, path} = Text.Data.fetch(:hyphenation, "hyph-en-us.tex")
iex> File.exists?(path)
true