Model registry — single source of truth for the languages and core WASM files
that tesseract_js knows about. Both CDN mode and local mode are driven from
this module.
The curated list below covers ~20 common languages with checksums and approximate
sizes for the :standard tessdata tier. Any language code (e.g. "hin",
"chi_tra+chi_sim") that is not in the curated list still works at runtime —
it just falls through to the URL template without checksum verification.
Use the helpers to resolve URLs and paths:
TesseractJs.Models.cdn_url("eng")
TesseractJs.Models.cdn_url("jpn_vert", :best)
TesseractJs.Models.local_path("eng")
TesseractJs.Models.filename("eng")The tessdata-version pinned by this package is 4.0.0 (the tessdata bundle
format used by tesseract.js 5.x). Bump it in lockstep with the JS core release.
Summary
Functions
Builds the jsDelivr URL for a language's traineddata file.
jsDelivr URL for the tesseract.js-core WASM bundle.
Filename for a WASM core variant.
Local path for the WASM core.
Returns the tesseract.js-core version this package is pinned to.
Filename for a language's traineddata file.
Returns the registry entry for a language, or nil if not curated.
Returns the curated registry as a map of lang => %{name, size_mb, sha256}.
Local filesystem path (relative to a Phoenix app's priv/static/) where the
Mix download task will write a language file.
Splits a +-combined lang string into individual lang codes.
Returns the tessdata version (4.0.0) this package is pinned to.
Functions
Builds the jsDelivr URL for a language's traineddata file.
iex> TesseractJs.Models.cdn_url("eng")
"https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng@1.0.0/4.0.0/eng.traineddata.gz"Supports +-combined langs by returning the URL for the first lang — the
consumer is expected to download each lang separately. (For local mode, all
combined langs must be present in the same dir.)
Tiers
:standard— full LSTM+legacy combined model, ~11 MB/lang gzipped.:best— smaller LSTM-only model (the_best_intvariant on jsDelivr), ~3 MB/lang gzipped, similar accuracy to standard for most languages.
The
:fasttier (tessdata_fast) requires uncompressed.traineddatafiles served from a different source and isn't supported in v0.1.
jsDelivr URL for the tesseract.js-core WASM bundle.
Filename for a WASM core variant.
Local path for the WASM core.
Returns the tesseract.js-core version this package is pinned to.
Filename for a language's traineddata file.
Returns the registry entry for a language, or nil if not curated.
Returns the curated registry as a map of lang => %{name, size_mb, sha256}.
Local filesystem path (relative to a Phoenix app's priv/static/) where the
Mix download task will write a language file.
iex> TesseractJs.Models.local_path("eng")
"/assets/vendor/tesseract/eng.traineddata.gz"
Splits a +-combined lang string into individual lang codes.
Returns the tessdata version (4.0.0) this package is pinned to.