# `Text.Data`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/data.ex#L1)

Locates runtime data files used by Text modules, fetching them
from upstream sources when permitted.

Most modules in this package ship a small bundled dataset (e.g.
`Text.Hyphenation` ships American English patterns, `Text.Lemma`
ships English lemmatization data) but also accept additional
language packs at runtime. This module is the single
configuration surface those modules share for finding, caching,
and optionally downloading those packs.

## Configuration

    config :text,
      # Where on-demand-downloaded data is cached.
      # Default: "~/.cache/text".
      data_dir: "~/.cache/text",

      # Per-domain auto-download permissions. All default to false:
      # the package never reaches out to the network without an
      # explicit opt-in for the relevant domain.
      auto_download_hyphenation_data: true,
      auto_download_lemma_data: true,
      auto_download_wordfreq_data: false

## Lookup order

When a Text module asks for a data file via `fetch/3`:

1. The configured `:data_dir` is checked first. Once a file has
   been cached there (whether downloaded by this package, copied
   manually, or shipped by a deployment script) it is reused on
   every subsequent call.

2. The bundled `priv/<domain>/` directory inside the `:text`
   package is checked next.

3. If a download URL is known and `auto_download_<domain>_data`
   is `true`, the file is downloaded to the cache directory and
   the cached path is returned.

4. Otherwise `{:error, %ArgumentError{}}` is returned with a
   message describing how to resolve the situation (enable
   auto-download, or place the file manually).

## Domains

Each Text module that needs runtime data uses its own domain
atom — `:hyphenation`, `:lemma`, `:wordfreq`. Domains map
directly to subdirectory names under both `priv/` and `:data_dir`,
and to per-domain permission keys.

# `auto_download?`

```elixir
@spec auto_download?(atom()) :: boolean()
```

Returns `true` when auto-download is enabled for the named domain.

### Arguments

* `domain` is an atom like `:hyphenation`. Looks up the
  application env key `:auto_download_<domain>_data`.

### Returns

* A boolean. Defaults to `false` if the key is not set.

# `bundled_path`

```elixir
@spec bundled_path(atom(), String.t()) :: String.t()
```

Returns the bundled path where a file would live (whether it does or not).

# `cache_path`

```elixir
@spec cache_path(atom(), String.t()) :: String.t()
```

Returns the cache path where a file would live (whether it does or not).

Useful for tooling that wants to inspect or warm the cache.

# `data_dir`

```elixir
@spec data_dir() :: String.t()
```

Returns the absolute path to the configured data directory.

### Returns

* The expanded path. The directory is not created until the first
  download writes to it.

### Examples

    iex> is_binary(Text.Data.data_dir())
    true

# `fetch`

```elixir
@spec fetch(atom(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, Exception.t()}
```

Locates a data file for a domain, downloading it on demand when
permitted.

### Arguments

* `domain` is an atom naming the data category (e.g.
  `:hyphenation`). Used as the subdirectory name in both
  `:data_dir` and `priv/`, and as the prefix of the
  auto-download permission key.

* `filename` is the name of the file to locate (e.g.
  `"hyph-de-1996.tex"`).

### Options

* `:url` is the upstream URL to download from when the file is
  not already cached. Required for auto-download to work; without
  it, `fetch/3` only consults the cache and bundled directories.

### Returns

* `{:ok, path}` — the file is available at `path`.

* `{:error, %ArgumentError{}}` — the file could not be located
  and either no URL was provided or auto-download is disabled
  for the domain. The error message explains how to resolve.

### Examples

    iex> {:ok, path} = Text.Data.fetch(:hyphenation, "hyph-en-us.tex")
    iex> File.exists?(path)
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
