# `Text.Language.Classifier.Fasttext.Dictionary`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/language/classifier/fasttext/dictionary.ex#L1)

Vocabulary and label table parsed from a fastText model file.

Mirrors the C++ `fasttext::Dictionary` data written by `Dictionary::save`.
Each entry is a `Text.Language.Classifier.Fasttext.Entry` carrying the
surface form (UTF-8 string), occurrence count from training, and a
word/label tag.

Entries are stored in two collections:

* `entries` is the original sequence in file order. Index `i` here is the
  same `i` used elsewhere in fastText to address the input matrix for word
  rows.

* `word_to_index` is a precomputed lookup keyed by the surface form,
  mapping back to the entry index. Built once at load time so feature
  extraction can do O(1) lookups.

See `docs/lid176_binary_format.md` (Section 3) for the byte layout.

# `t`

```elixir
@type t() :: %Text.Language.Classifier.Fasttext.Dictionary{
  entries: [Text.Language.Classifier.Fasttext.Entry.t()],
  nlabels: non_neg_integer(),
  ntokens: non_neg_integer(),
  nwords: non_neg_integer(),
  pruneidx: %{required(integer()) =&gt; integer()},
  pruneidx_size: non_neg_integer(),
  size: non_neg_integer(),
  word_to_index: %{required(String.t()) =&gt; non_neg_integer()}
}
```

# `decode`

```elixir
@spec decode(binary()) :: {:ok, t(), binary()} | {:error, term()}
```

Decodes the dictionary section of a fastText model file.

### Arguments

* `binary` is the raw byte sequence positioned at the start of the
  dictionary block (immediately after the args block).

### Returns

* `{:ok, dictionary, rest}` where `dictionary` is a `t:t/0` struct and
  `rest` is the binary remainder positioned at the start of the
  `quant_input` flag byte.

* `{:error, reason}` if the input is truncated or malformed (e.g. an
  unterminated word string, an out-of-range entry type byte).

# `labels`

```elixir
@spec labels(t()) :: [String.t()]
```

Returns the labels (in file order) with the `__label__` prefix stripped.

For `lid.176` this produces a 176-element list of language tags such as
`["en", "zh-Hans", "fr", ...]`. Index `i` in the returned list corresponds
to row `i` of the output matrix.

### Arguments

* `dictionary` is a parsed `t:t/0`.

### Returns

* A list of `nlabels` strings.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
