ExDataSketch.Hash (ExDataSketch v0.7.1)

Copy Markdown View Source

Stable 64-bit hash interface for ExDataSketch.

All sketch algorithms require a deterministic hash function that maps arbitrary Elixir terms to 64-bit unsigned integers. This module provides that interface with automatic backend selection and a pure-Elixir fallback.

Hash Properties

  • Output range: 0..2^64-1 (unsigned 64-bit integer).
  • Deterministic: same input always produces same output within the same runtime configuration.
  • Uniform distribution: output bits are well-distributed for sketch accuracy.

Auto-detection

When no custom :hash_fn is provided, hash64/2 automatically selects the best available hash implementation:

  • XXHash3 (NIF): When the Rust NIF is loaded, hash64/2 uses XXHash3 which produces native 64-bit hashes with zero Elixir-side overhead. XXHash3 output is stable across platforms.

  • phash2 + mix64 (pure): When the NIF is not available, hash64/2 falls back to :erlang.phash2/2 with a fixnum-safe 64-bit mixer. The mixer uses 16-bit partial products to avoid bigint heap allocations while preserving full 64-bit output quality.

The NIF availability check is performed once and cached in :persistent_term for zero-cost subsequent lookups.

Pluggable Hash

Pass hash_fn: fn term -> non_neg_integer end to override the default. The custom function must return values in 0..2^64-1.

Stability

:erlang.phash2/2 output is not guaranteed stable across OTP major versions. XXHash3 output is stable across platforms. For cross-version stability, use the NIF build (XXHash3) or supply a custom :hash_fn.

Summary

Functions

Returns the default hash strategy based on NIF availability.

Hashes an arbitrary Elixir term to a 64-bit unsigned integer.

Hashes a raw binary to a 64-bit unsigned integer.

Returns whether the NIF is available for hashing.

Validates that two sets of sketch options have compatible hashing configuration.

Hashes a binary using XXHash3 (64-bit) via Rust NIF.

Hashes a binary using XXHash3 (64-bit) with a seed via Rust NIF.

Types

hash64()

@type hash64() :: non_neg_integer()

hash_opt()

@type hash_opt() ::
  {:seed, non_neg_integer()}
  | {:hash_fn, (term() -> hash64())}
  | {:hash_strategy, hash_strategy()}

hash_strategy()

@type hash_strategy() :: :phash2 | :xxhash3 | :custom

opts()

@type opts() :: [hash_opt()]

Functions

default_hash_strategy()

@spec default_hash_strategy() :: :xxhash3 | :phash2

Returns the default hash strategy based on NIF availability.

Returns :xxhash3 when the NIF is loaded, :phash2 otherwise.

hash64(term, opts \\ [])

@spec hash64(term(), opts()) :: hash64()

Hashes an arbitrary Elixir term to a 64-bit unsigned integer.

When no :hash_fn is provided, automatically uses XXHash3 via NIF if available, otherwise falls back to phash2 with fixnum-safe bit mixing.

Options

  • :seed - seed value for the hash (default: 0). Combined with the base hash.
  • :hash_fn - custom hash function (term -> 0..2^64-1). When provided, :seed is ignored and the function is called directly.

Examples

iex> h = ExDataSketch.Hash.hash64("hello")
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.hash64("hello") == ExDataSketch.Hash.hash64("hello")
true

iex> ExDataSketch.Hash.hash64("hello") != ExDataSketch.Hash.hash64("world")
true

iex> ExDataSketch.Hash.hash64("test", seed: 42) != ExDataSketch.Hash.hash64("test", seed: 0)
true

hash64_binary(binary, opts \\ [])

@spec hash64_binary(binary(), opts()) :: hash64()

Hashes a raw binary to a 64-bit unsigned integer.

Operates directly on binary bytes without term encoding overhead. Useful when the input is already binary data (e.g., from external sources).

When no :hash_fn is provided, automatically uses XXHash3 via NIF if available, otherwise falls back to phash2 with fixnum-safe bit mixing.

Options

Same as hash64/2.

Examples

iex> h = ExDataSketch.Hash.hash64_binary(<<1, 2, 3>>)
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.hash64_binary(<<"abc">>) == ExDataSketch.Hash.hash64_binary(<<"abc">>)
true

nif_available?()

@spec nif_available?() :: boolean()

Returns whether the NIF is available for hashing.

The result is computed once and cached in :persistent_term.

validate_merge_hash_compat!(opts_a, opts_b, sketch_type)

@spec validate_merge_hash_compat!(Keyword.t(), Keyword.t(), String.t()) :: :ok

Validates that two sets of sketch options have compatible hashing configuration.

Raises ExDataSketch.Errors.IncompatibleSketchesError if:

  • Either sketch uses a custom :hash_fn (closures cannot be compared)
  • Hash strategies differ (e.g. :xxhash3 vs :phash2)
  • Seeds differ (default is 0)

xxhash3_64(data)

@spec xxhash3_64(binary()) :: hash64()

Hashes a binary using XXHash3 (64-bit) via Rust NIF.

Returns a deterministic 64-bit hash that is stable across platforms and versions when the Rust NIF is available. Falls back to the phash2-based hash if the NIF is not loaded; the fallback is NOT stable across OTP major versions (see module docs).

This function operates on raw binary data. For Elixir terms, convert to binary first (e.g., using :erlang.term_to_binary/1 or to_string/1).

Examples

iex> h = ExDataSketch.Hash.xxhash3_64("hello")
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.xxhash3_64("hello") == ExDataSketch.Hash.xxhash3_64("hello")
true

xxhash3_64(data, seed)

@spec xxhash3_64(binary(), non_neg_integer()) :: hash64()

Hashes a binary using XXHash3 (64-bit) with a seed via Rust NIF.

Falls back to the phash2-based hash if the NIF is not available.

Examples

iex> h = ExDataSketch.Hash.xxhash3_64("hello", 42)
iex> is_integer(h) and h >= 0
true

iex> ExDataSketch.Hash.xxhash3_64("hello", 0) != ExDataSketch.Hash.xxhash3_64("hello", 42)
true