# `Text.Language.Classifier.Fasttext.Hash`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/language/classifier/fasttext/hash.ex#L1)

Bit-exact port of fastText's string hash function.

fastText uses a Fowler–Noll–Vo (FNV-1a) variant with one quirk: each input
byte is reinterpreted as a signed 8-bit integer before being widened to
unsigned 32-bit. Bytes with the high bit set therefore contribute their
sign-extended value to the hash mix step, not their unsigned value. This
is documented in `src/dictionary.cc` (`Dictionary::hash`) of the fastText
source as a deliberate compatibility decision so that all already-released
models hash identically.

Translating the C++ literally:

    uint32_t h = 2166136261;
    for (size_t i = 0; i < str.size(); i++) {
      h = h ^ uint32_t(int8_t(str[i]));
      h = h * 16777619;
    }

The two constants are the canonical FNV offset basis and FNV prime for
32-bit FNV-1a.

Any deviation from the reference here will silently produce wrong subword
indices and wreck the model's predictions for non-ASCII scripts. The hash
is exercised by golden tests against fastText's own `get_subwords/1`
output for a large corpus of words.

# `hash`

```elixir
@spec hash(binary()) :: non_neg_integer()
```

Returns the 32-bit FNV-1a-with-signed-byte hash of a binary.

### Arguments

* `binary` is any UTF-8 string or arbitrary byte sequence. fastText
  operates on UTF-8 byte sequences, so passing a `t:String.t/0` is the
  typical use.

### Returns

* A non-negative integer in `[0, 2^32 - 1]`.

### Examples

    iex> Text.Language.Classifier.Fasttext.Hash.hash("")
    2166136261

    iex> Text.Language.Classifier.Fasttext.Hash.hash("a")
    3826002220

    iex> Text.Language.Classifier.Fasttext.Hash.hash("the")
    3020861980

---

*Consult [api-reference.md](api-reference.md) for complete listing*
