Text.Data (Text v0.5.0)

Copy Markdown View Source

Locates runtime data files used by Text modules, fetching them from upstream sources when permitted.

Most modules in this package ship a small bundled dataset (e.g. Text.Hyphenation ships American English patterns, Text.Lemma ships English lemmatization data) but also accept additional language packs at runtime. This module is the single configuration surface those modules share for finding, caching, and optionally downloading those packs.

Configuration

config :text,
  # Where on-demand-downloaded data is cached.
  # Default: "~/.cache/text".
  data_dir: "~/.cache/text",

  # Per-domain auto-download permissions. All default to false:
  # the package never reaches out to the network without an
  # explicit opt-in for the relevant domain.
  auto_download_hyphenation_data: true,
  auto_download_lemma_data: true,
  auto_download_wordfreq_data: false

Lookup order

When a Text module asks for a data file via fetch/3:

  1. The configured :data_dir is checked first. Once a file has been cached there (whether downloaded by this package, copied manually, or shipped by a deployment script) it is reused on every subsequent call.

  2. The bundled priv/<domain>/ directory inside the :text package is checked next.

  3. If a download URL is known and auto_download_<domain>_data is true, the file is downloaded to the cache directory and the cached path is returned.

  4. Otherwise {:error, %ArgumentError{}} is returned with a message describing how to resolve the situation (enable auto-download, or place the file manually).

Domains

Each Text module that needs runtime data uses its own domain atom — :hyphenation, :lemma, :wordfreq. Domains map directly to subdirectory names under both priv/ and :data_dir, and to per-domain permission keys.

Summary

Functions

Returns true when auto-download is enabled for the named domain.

Returns the bundled path where a file would live (whether it does or not).

Returns the cache path where a file would live (whether it does or not).

Returns the absolute path to the configured data directory.

Locates a data file for a domain, downloading it on demand when permitted.

Functions

auto_download?(domain)

@spec auto_download?(atom()) :: boolean()

Returns true when auto-download is enabled for the named domain.

Arguments

  • domain is an atom like :hyphenation. Looks up the application env key :auto_download_<domain>_data.

Returns

  • A boolean. Defaults to false if the key is not set.

bundled_path(domain, filename)

@spec bundled_path(atom(), String.t()) :: String.t()

Returns the bundled path where a file would live (whether it does or not).

cache_path(domain, filename)

@spec cache_path(atom(), String.t()) :: String.t()

Returns the cache path where a file would live (whether it does or not).

Useful for tooling that wants to inspect or warm the cache.

data_dir()

@spec data_dir() :: String.t()

Returns the absolute path to the configured data directory.

Returns

  • The expanded path. The directory is not created until the first download writes to it.

Examples

iex> is_binary(Text.Data.data_dir())
true

fetch(domain, filename, options \\ [])

@spec fetch(atom(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, Exception.t()}

Locates a data file for a domain, downloading it on demand when permitted.

Arguments

  • domain is an atom naming the data category (e.g. :hyphenation). Used as the subdirectory name in both :data_dir and priv/, and as the prefix of the auto-download permission key.

  • filename is the name of the file to locate (e.g. "hyph-de-1996.tex").

Options

  • :url is the upstream URL to download from when the file is not already cached. Required for auto-download to work; without it, fetch/3 only consults the cache and bundled directories.

Returns

  • {:ok, path} — the file is available at path.

  • {:error, %ArgumentError{}} — the file could not be located and either no URL was provided or auto-download is disabled for the domain. The error message explains how to resolve.

Examples

iex> {:ok, path} = Text.Data.fetch(:hyphenation, "hyph-en-us.tex")
iex> File.exists?(path)
true