# `TesseractJs.Models`
[🔗](https://github.com/alexdont/tesseract_js/blob/v0.1.0/lib/tesseract_js/models.ex#L1)

Model registry — single source of truth for the languages and core WASM files
that `tesseract_js` knows about. Both CDN mode and local mode are driven from
this module.

The curated list below covers ~20 common languages with checksums and approximate
sizes for the `:standard` tessdata tier. Any language code (e.g. `"hin"`,
`"chi_tra+chi_sim"`) that is *not* in the curated list still works at runtime —
it just falls through to the URL template without checksum verification.

Use the helpers to resolve URLs and paths:

    TesseractJs.Models.cdn_url("eng")
    TesseractJs.Models.cdn_url("jpn_vert", :best)
    TesseractJs.Models.local_path("eng")
    TesseractJs.Models.filename("eng")

The tessdata-version pinned by this package is `4.0.0` (the tessdata bundle
format used by tesseract.js 5.x). Bump it in lockstep with the JS core release.

# `cdn_url`

```elixir
@spec cdn_url(String.t(), atom()) :: String.t()
```

Builds the jsDelivr URL for a language's traineddata file.

    iex> TesseractJs.Models.cdn_url("eng")
    "https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng@1.0.0/4.0.0/eng.traineddata.gz"

Supports `+`-combined langs by returning the URL for the *first* lang — the
consumer is expected to download each lang separately. (For local mode, all
combined langs must be present in the same dir.)

## Tiers

  * `:standard` — full LSTM+legacy combined model, ~11 MB/lang gzipped.
  * `:best` — smaller LSTM-only model (the `_best_int` variant on jsDelivr),
    ~3 MB/lang gzipped, similar accuracy to standard for most languages.

> The `:fast` tier (`tessdata_fast`) requires uncompressed `.traineddata`
> files served from a different source and isn't supported in v0.1.

# `core_cdn_url`

```elixir
@spec core_cdn_url(atom()) :: String.t()
```

jsDelivr URL for the tesseract.js-core WASM bundle.

# `core_filename`

```elixir
@spec core_filename(atom()) :: String.t()
```

Filename for a WASM core variant.

# `core_local_path`

```elixir
@spec core_local_path(atom(), Path.t()) :: String.t()
```

Local path for the WASM core.

# `core_version`

Returns the tesseract.js-core version this package is pinned to.

# `filename`

```elixir
@spec filename(String.t()) :: String.t()
```

Filename for a language's traineddata file.

# `get`

```elixir
@spec get(String.t()) :: map() | nil
```

Returns the registry entry for a language, or `nil` if not curated.

# `list`

```elixir
@spec list() :: %{required(String.t()) =&gt; map()}
```

Returns the curated registry as a map of `lang => %{name, size_mb, sha256}`.

# `local_path`

```elixir
@spec local_path(String.t(), Path.t()) :: String.t()
```

Local filesystem path (relative to a Phoenix app's `priv/static/`) where the
Mix download task will write a language file.

    iex> TesseractJs.Models.local_path("eng")
    "/assets/vendor/tesseract/eng.traineddata.gz"

# `split_langs`

```elixir
@spec split_langs(String.t()) :: [String.t()]
```

Splits a `+`-combined lang string into individual lang codes.

# `tessdata_version`

Returns the tessdata version (`4.0.0`) this package is pinned to.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
