This guide covers the full API, configuration options, backend system, serialization formats, and error handling for ExDataSketch.

Options

HLL Options

OptionTypeDefaultDescription
:pinteger14Precision parameter. Valid range: 4..16. Higher values use more memory but give better accuracy. Register count = 2^p.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Memory usage: 2^p bytes for registers (e.g., p=14 uses 16 KiB).

Relative error: approximately 1.04 / sqrt(2^p).

ULL Options

UltraLogLog (Ertl, 2023) provides approximately 20% better accuracy than HLL at the same memory footprint. It uses the same register array layout but stores a different value per register and applies the FGRA estimator.

OptionTypeDefaultDescription
:pinteger14Precision parameter. Valid range: 4..26. Higher values use more memory but give better accuracy. Register count = 2^p.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Memory usage: 8 + 2^p bytes (e.g., p=14 uses ~16 KiB).

Relative error: approximately 0.835 / sqrt(2^p).

HLL vs ULL Comparison

pMemoryHLL ErrorULL ErrorULL Improvement
10~1 KiB3.25%2.61%~20%
12~4 KiB1.63%1.30%~20%
14~16 KiB0.81%0.65%~20%
16~64 KiB0.41%0.33%~20%
sketch = ExDataSketch.ULL.new(p: 14)
sketch = ExDataSketch.ULL.update_many(sketch, items)
ExDataSketch.ULL.estimate(sketch)

# Merge two sketches
merged = ExDataSketch.ULL.merge(sketch_a, sketch_b)

# Build from enumerable
sketch = ExDataSketch.ULL.from_enumerable(stream, p: 14)

CMS Options

OptionTypeDefaultDescription
:widthpos_integer2048Number of counters per row. Higher values reduce error.
:depthpos_integer5Number of hash functions (rows). Higher values reduce failure probability.
:counter_width32 or 6432Bit width of each counter.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Memory usage: width * depth * (counter_width / 8) bytes.

Error bound: e * total_count / width with probability 1 - (1/2)^depth, where e is Euler's number.

Theta Options

OptionTypeDefaultDescription
:kpos_integer4096Nominal number of entries. Controls accuracy.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

KLL Options

OptionTypeDefaultDescription
:kinteger200Accuracy parameter. Valid range: 8..65535. Higher values use more memory but give better accuracy.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Rank error: approximately 1.65 / k.

DDSketch Options

OptionTypeDefaultDescription
:alphafloat0.01Relative accuracy parameter. Controls bucket width. Must be in (0.0, 1.0).
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Value-relative error: the returned quantile value v satisfies v_true * (1 - alpha) <= v <= v_true * (1 + alpha).

alphaRelative ErrorBucket Count (for 1ms..10s range)
0.055%~185
0.011%~920
0.0050.5%~1840
0.0010.1%~9200

FrequentItems Options

OptionTypeDefaultDescription
:kpos_integer10Maximum number of counters. Controls how many distinct items can be tracked. Higher values give more accurate frequency estimates.
:key_encodingatom/tuple:binaryKey encoding policy: :binary (raw binaries), :int (64-bit LE integers), or {:term, :external} (Erlang external term format).
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

FrequentItems uses the SpaceSaving algorithm to track the top-k most frequent items in a data stream. It maintains at most k counters, each storing an item, its estimated count, and a maximum overcount error.

When a new item arrives and all k slots are full, the item with the minimum count is evicted (ties broken by lexicographically smallest key). The new item inherits the evicted count plus one, and the evicted count is recorded as the error bound.

Frequency estimates are always upper bounds -- the true frequency may be lower, but never higher. The lower field gives a guaranteed lower bound: max(estimate - error, 0).

Bloom Options

OptionTypeDefaultDescription
:capacitypos_integer10,000Expected number of items to insert. Used to derive optimal bit_count and hash_count.
:false_positive_ratefloat0.01Target false positive probability. Must be in (0.0, 1.0).
:seednon_neg_integer0Hash seed. Filters with different seeds cannot be merged.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Bloom filters provide probabilistic membership testing: member?/2 returns true if an item was probably inserted, or false if it was definitely not. False positives are possible; false negatives are not.

Parameters are derived automatically:

  • bit_count = ceil(-capacity * ln(fpr) / ln(2)^2)
  • hash_count = max(1, round(bit_count / capacity * ln(2))), clamped to 1..30

For capacity=100,000 and fpr=0.01: bit_count=958,506, hash_count=7, ~117 KB.

Cuckoo Options

OptionTypeDefaultDescription
:capacitypos_integer10,000Expected number of items. Used to derive bucket count.
:fingerprint_sizeinteger8Fingerprint width in bits. Valid values: 8, 12, 16. Larger fingerprints reduce false positive rate.
:bucket_sizeinteger4Slots per bucket. Valid values: 2, 4. Bucket size 4 gives better space efficiency.
:max_kicksinteger500Maximum relocation attempts before reporting full. Valid range: 100..2000.
:seednon_neg_integer0Hash seed. Filters with different seeds are not compatible.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Cuckoo filters provide membership testing with deletion support. Unlike Bloom filters, items can be removed after insertion without introducing false negatives. Uses partial-key cuckoo hashing where fingerprints are stored in one of two candidate buckets.

put/2 returns {:ok, cuckoo} on success or {:error, :full} when the filter cannot relocate fingerprints after max_kicks attempts. delete/2 returns {:ok, cuckoo} on success or {:error, :not_found} if the fingerprint is not present.

False positive rate: approximately 2 * bucket_size / 2^fingerprint_size. For fingerprint_size=8, bucket_size=4: ~3.1%. For fingerprint_size=16, bucket_size=4: ~0.012%.

Quotient Options

OptionTypeDefaultDescription
:qinteger16Quotient bits. Determines number of slots (2^q). Valid range: 1..28.
:rinteger8Remainder bits stored per slot. Valid range: 1..32. Constraint: q + r <= 64.
:seednon_neg_integer0Hash seed. Filters with different seeds cannot be merged.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

Quotient filters split each fingerprint into a quotient (slot index) and remainder (stored value). Three metadata bits per slot (is_occupied, is_continuation, is_shifted) enable linear probing with cluster tracking.

Supports safe deletion and merge without re-hashing. Merge combines two filters with matching parameters by iterating runs from both filters.

False positive rate: approximately 1 / 2^r. For r=8: ~0.4%. For r=16: ~0.0015%.

Memory usage: 2^q * (r + 3) / 8 bytes (3 metadata bits per slot).

CQF Options

OptionTypeDefaultDescription
:qinteger16Quotient bits. Determines number of slots (2^q). Valid range: 1..28.
:rinteger8Remainder bits stored per slot. Valid range: 1..32. Constraint: q + r <= 64.
:seednon_neg_integer0Hash seed. Filters with different seeds cannot be merged.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

CQF extends the quotient filter with variable-length counter encoding to track item multiplicities. Use estimate_count/2 to query how many times an item has been inserted. Counts are approximate -- never underestimated but may be overestimated.

Merge sums counts across filters. Deletion decrements the count for an item (safe no-op if the item is not present).

cqf = ExDataSketch.CQF.new(q: 16, r: 8)
cqf = cqf |> ExDataSketch.CQF.put("event") |> ExDataSketch.CQF.put("event")
ExDataSketch.CQF.estimate_count(cqf, "event")  # >= 2

XorFilter Options

OptionTypeDefaultDescription
:fingerprint_bitsinteger8Fingerprint width. Valid values: 8, 16.
:seednon_neg_integer0Hash seed.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

XorFilter is a static, immutable membership filter. All items must be provided at construction time via build/2. After construction, only member?/2 queries are supported -- no insertion, deletion, or merge.

XorFilter offers the smallest memory footprint and fastest lookups of all membership filters in this library. Use it when the set is known ahead of time and will not change.

items = MapSet.new(["alice", "bob", "carol"])
{:ok, xor} = ExDataSketch.XorFilter.build(items, fingerprint_bits: 8)
ExDataSketch.XorFilter.member?(xor, "alice")  # true
ExDataSketch.XorFilter.member?(xor, "dave")   # false (probably)

False positive rate: approximately 1 / 2^fingerprint_bits. For 8-bit: ~0.4%. For 16-bit: ~0.0015%.

IBLT Options

OptionTypeDefaultDescription
:cell_countpos_integer1,000Number of cells. Valid range: 1..16,777,216. More cells improve decode success rate.
:hash_countpos_integer3Number of hash functions. Valid range: 1..10.
:seednon_neg_integer0Hash seed. IBLTs with different seeds cannot be subtracted or merged.
:backendmoduleExDataSketch.Backend.PureBackend module for computation.

IBLT (Invertible Bloom Lookup Table) extends Bloom filters with the ability to list entries and find set differences. Two parties build IBLTs from their respective sets, exchange them, and subtract to discover items present in one set but not the other.

Supports two modes:

  • Set mode: put/2 and delete/2 operate on items (value_hash = 0).
  • Key-value mode: put/3 and delete/3 operate on key-value pairs.

subtract/2 performs cell-wise subtraction. list_entries/1 peels the resulting IBLT to recover entries, returning {:ok, %{positive: [...], negative: [...]}} or {:error, :decode_failed} if the difference is too large for the cell count.

iblt_a = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_a)
iblt_b = ExDataSketch.IBLT.new(cell_count: 1000) |> ExDataSketch.IBLT.put_many(set_b)

diff = ExDataSketch.IBLT.subtract(iblt_a, iblt_b)
{:ok, %{positive: only_in_a, negative: only_in_b}} = ExDataSketch.IBLT.list_entries(diff)

Rule of thumb: set cell_count to at least 3x the expected number of differences for reliable decoding.

FilterChain Options

FilterChain composes multiple membership filters into a pipeline with capability-aware stage validation.

chain =
  ExDataSketch.FilterChain.new()
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Cuckoo.new(capacity: 10_000))
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Bloom.new(capacity: 100_000))

Stage positions:

PositionAllowed typesNotes
Front / MiddleBloom, Cuckoo, Quotient, CQFDynamic filters supporting :put
TerminalXorFilterStatic filter, must be last stage
AdjunctIBLTNot in query path, used for reconciliation

Operations:

  • put/2 -- Inserts into all stages supporting :put (skips XorFilter).
  • member?/2 -- Queries stages in order, short-circuits on false.
  • delete/2 -- Deletes from all stages. Raises UnsupportedOperationError if any stage does not support :delete.

Lifecycle-tier pattern example:

# Hot tier (absorbs writes) -> Cold tier (compacted snapshot)
chain =
  ExDataSketch.FilterChain.new()
  |> ExDataSketch.FilterChain.add_stage(ExDataSketch.Cuckoo.new(capacity: 50_000))

# Later, compact hot tier into a static XorFilter
{:ok, xor} = ExDataSketch.XorFilter.build(compacted_items, fingerprint_bits: 16)
chain = ExDataSketch.FilterChain.add_stage(chain, xor)

Membership Filter Comparison

PropertyBloomCuckooQuotientCQFXorFilterIBLT
Insertyesyesyesyesbuild-onlyyes
Deletenoyesyesyesnoyes
Mergeyesnoyesyesnoyes
Count itemsyesyesyesyes (multiset)yesyes
Staticnonononoyesno
Reconciliationnononononoyes
Space efficiencygoodbetter at low FPRmoderatemoderatebestdepends on diff size

REQ Options

The Relative Error Quantile (REQ) sketch provides relative-error rank accuracy. Unlike KLL (rank error) or DDSketch (value-relative error), REQ concentrates accuracy near the extremes of the distribution.

OptionTypeDefaultDescription
kinteger12Accuracy parameter. Higher k means more accuracy but more memory
hrabooleantrueHigh Rank Accuracy. When true, accuracy is better at high ranks (p99, p999)
backendmoduleconfiguredBackend module
sketch = ExDataSketch.REQ.new(k: 12, hra: true)
sketch = ExDataSketch.REQ.update_many(sketch, Enum.map(1..10_000, &(&1 * 1.0)))
ExDataSketch.REQ.quantile(sketch, 0.99)
ExDataSketch.REQ.rank(sketch, 9500.0)
ExDataSketch.REQ.cdf(sketch, [1000.0, 5000.0, 9000.0])
ExDataSketch.REQ.pmf(sketch, [1000.0, 5000.0, 9000.0])

MisraGries Options

The Misra-Gries algorithm provides deterministic heavy-hitter detection. Unlike FrequentItems (SpaceSaving), Misra-Gries makes no estimate when an item is not tracked, returning 0 instead.

OptionTypeDefaultDescription
:kinteger10Number of counters. Detects items with frequency > n/k.
:key_encodingatom/tuple:binaryKey encoding policy: :binary (raw binaries), :int (signed 64-bit LE integers), or {:term, :external} (Erlang external term format).
:backendmoduleconfiguredBackend module.
# Default binary keys
sketch = ExDataSketch.MisraGries.new(k: 10)
sketch = ExDataSketch.MisraGries.update_many(sketch, items)
ExDataSketch.MisraGries.estimate(sketch, "frequent_item")
ExDataSketch.MisraGries.top_k(sketch, 10)

# Integer keys
sketch = ExDataSketch.MisraGries.new(k: 10, key_encoding: :int)
sketch = ExDataSketch.MisraGries.update_many(sketch, [100, 200, 100, 300, 100])
ExDataSketch.MisraGries.top_k(sketch, 3)
# => [{100, 3}, {200, 1}, {300, 1}]

# Arbitrary Erlang terms
sketch = ExDataSketch.MisraGries.new(k: 10, key_encoding: {:term, :external})
sketch = ExDataSketch.MisraGries.update_many(sketch, [{:user, 1}, {:user, 2}, {:user, 1}])
ExDataSketch.MisraGries.estimate(sketch, {:user, 1})
# => 2

XXHash3

ExDataSketch includes a Rust NIF implementation of XXHash3 for high-quality 64-bit hashing. This is used internally by the membership filters and is also available directly.

ExDataSketch.Hash.xxhash3_64("some data")
ExDataSketch.Hash.xxhash3_64("some data", 42)

KLL vs DDSketch

Both are quantile sketches available through ExDataSketch.Quantiles, but they provide different accuracy guarantees:

PropertyKLLDDSketch
Error typeRank errorValue-relative error
Best forGeneral rank queriesLatency percentiles, SLOs
Parameterk (default 200)alpha (default 0.01)
GuaranteeTrue rank within ~1.65/kTrue value within factor (1 +/- alpha)
Negative valuesSupportedRejected
MemoryFixed by kGrows with log(max/min)

Use KLL when you need rank accuracy (e.g., "what fraction of values are below X?"). Use DDSketch when you need value accuracy (e.g., "p99 latency is 142ms +/- 1%").

Quantiles Facade

The ExDataSketch.Quantiles module provides a type-dispatched facade:

# KLL (default)
sketch = ExDataSketch.Quantiles.new(type: :kll, k: 200)
sketch = ExDataSketch.Quantiles.update_many(sketch, 1..1000)
ExDataSketch.Quantiles.quantile(sketch, 0.5)  # approximate median

# DDSketch
sketch = ExDataSketch.Quantiles.new(type: :ddsketch, alpha: 0.01)
sketch = ExDataSketch.Quantiles.update_many(sketch, latency_samples)
ExDataSketch.Quantiles.quantile(sketch, 0.99)  # p99 with relative accuracy

Backend System

ExDataSketch uses a backend system to allow swapping computation engines without changing the public API.

Available Backends

  • ExDataSketch.Backend.Pure -- Pure Elixir implementation. Always available. Default backend.
  • ExDataSketch.Backend.Rust -- Rust NIF acceleration. Precompiled binaries are downloaded automatically on supported platforms. Falls back to Pure if the NIF is not available.

Precompiled NIF Binaries

The Rust NIF is distributed as precompiled binaries for the following platforms:

PlatformArchitecture
macOSARM64 (Apple Silicon), x86_64
Linux (glibc)x86_64, aarch64
Linux (musl)x86_64, aarch64

On these platforms, mix compile automatically downloads the correct binary. No Rust toolchain is required.

To force compilation from source (requires Rust):

EX_DATA_SKETCH_BUILD=1 mix compile

On unsupported platforms, the library operates in pure Elixir mode automatically. No configuration is needed.

Check availability at runtime:

ExDataSketch.Backend.Rust.available?()
# => true or false

Selecting a Backend

Per-sketch:

sketch = ExDataSketch.HLL.new(backend: ExDataSketch.Backend.Rust)

Global default (in config):

config :ex_data_sketch, backend: ExDataSketch.Backend.Rust

The per-sketch option always takes precedence over the global config. If Backend.Rust is configured but the NIF is not available, it automatically falls back to Backend.Pure.

Rust Backend Details

The Rust backend accelerates batch and traversal operations via NIFs:

Rust NIFPure fallback
hll_update_many, hll_merge, hll_estimatehll_new, hll_update
cms_update_many, cms_mergecms_new, cms_update, cms_estimate
theta_update_many, theta_mergetheta_new, theta_update, theta_compact, theta_estimate
kll_update_many, kll_mergekll_new, kll_update, kll_quantile, kll_rank, kll_count, kll_min, kll_max

Dirty Scheduler Thresholds

Batch operations automatically use dirty CPU schedulers when input size exceeds configurable thresholds:

OperationDefault threshold
hll_update_many10,000 hashes
cms_update_many10,000 pairs
theta_update_many10,000 hashes
kll_update_many10,000 values
cms_merge100,000 total counters
theta_merge50,000 combined entries
kll_merge50,000 combined items

Override globally:

config :ex_data_sketch, :dirty_thresholds, %{
  hll_update_many: 5_000,
  cms_update_many: 20_000
}

Or per-call via opts:

HLL.update_many(sketch, items, dirty_threshold: 5_000)

Backend Guarantees

  • Both backends produce identical results for the same inputs.
  • Serialized state is identical regardless of which backend produced it.
  • The public API does not change between backends.

Serialization

ExDataSketch-Native Format (EXSK)

All sketches support the native binary format:

binary = ExDataSketch.HLL.serialize(sketch)
{:ok, sketch} = ExDataSketch.HLL.deserialize(binary)

The EXSK format structure:

FieldSizeDescription
Magic4 bytes"EXSK"
Version1 byteFormat version (currently 1)
Sketch ID1 byteIdentifies sketch type (HLL=1, CMS=2, Theta=3, KLL=4, DDSketch=5, FrequentItems=6, Bloom=7)
Params length4 bytesLittle-endian u32, byte length of params
ParamsvariableSketch-specific parameters
State length4 bytesLittle-endian u32, byte length of state
StatevariableRaw sketch state

DataSketches Interop

Selected sketch types support the Apache DataSketches binary format for cross-language interoperability:

# Theta sketch interop (priority target)
binary = ExDataSketch.Theta.serialize_datasketches(theta_sketch)
{:ok, sketch} = ExDataSketch.Theta.deserialize_datasketches(binary)

Interop priority order: Theta (CompactSketch), then HLL, then KLL. KLL DataSketches interop is stubbed but not yet implemented.

Hashing

ExDataSketch uses a deterministic 64-bit hash function for all sketch operations. The hash module provides:

# Hash any Elixir term
ExDataSketch.Hash.hash64("hello")

# Hash raw binary data
ExDataSketch.Hash.hash64_binary(<<1, 2, 3>>)

Auto-detection

hash64/2 automatically selects the best available hash implementation:

  • XXHash3 (NIF): When the Rust NIF is loaded, hash64/2 uses XXHash3 via NIF, producing native 64-bit hashes with zero Elixir-side overhead. XXHash3 output is stable across platforms.

  • phash2 + fixnum-safe mix64 (pure): When the NIF is not available, hash64/2 falls back to :erlang.phash2/2 with a 64-bit mixer that uses 16-bit partial products to avoid bigint heap allocations. Every intermediate value stays under 35 bits (well within the BEAM's 60-bit fixnum limit), eliminating the 3+ transient bigint allocations per hash call that a naive 64-bit multiply would incur.

The NIF availability check is performed once and cached in :persistent_term for zero-cost subsequent lookups.

Why not 32-bit hashes

32-bit hashes degrade sketch accuracy above approximately 10M items due to birthday-paradox collisions. The fixnum-safe mixer preserves full 64-bit output quality at the cost of roughly 20 fixnum operations per hash -- still faster than 3+ bigint heap allocations from naive 64-bit multiplications.

Hash strategy tagging

Sketches record which hash function was used at creation time (:xxhash3, :phash2, or :custom). This tag is persisted through serialization in the EXSK params section and controls runtime hash dispatch:

  • :phash2 -- always uses the pure Elixir mix64 path, even when the NIF is available.
  • :xxhash3 -- requires the Rust NIF; raises ArgumentError if unavailable.
  • :custom -- cannot be deserialized (the original function is not recoverable from bytes); deserialize/1 returns {:error, %DeserializationError{}}.

Merge operations validate that both sketches share the same strategy and seed.

Pluggable hash

Users can override the default hash with a custom function:

HLL.new(p: 14, hash_fn: fn term -> my_hash(term) end)

When :hash_fn is provided, the sketch records hash_strategy: :custom.

Stability

:erlang.phash2/2 output is not guaranteed stable across OTP major versions. XXHash3 output is stable across platforms. For cross-version or cross-system stability, use the NIF build (XXHash3) or supply a custom :hash_fn.

Use Cases

Algorithm Selection Guide

Use CaseAlgorithmKey OptionWhat You Get
Count distinct users / IPs / sessionsHLLp: 14Cardinality estimate with ~0.8% error
Frequency of specific items (query counts, error codes)CMSwidth: 2048, depth: 5Per-item count estimates (always >= true count)
Set intersection / union cardinalityThetak: 4096Jaccard similarity, union/intersection sizes
Percentiles and rank queries (general)KLLk: 200Median, P95, P99 with rank-error guarantees
Latency percentiles and SLO monitoringDDSketchalpha: 0.01P99 with value-relative error (e.g., 142ms +/- 1%)
Top-k / heavy hitter detectionFrequentItemsk: 64Most frequent items with count and error bounds
Membership testing (seen before?)Bloomcapacity: 100_000"Probably yes" or "definitely no" with tunable FPR
Membership with deletionCuckoocapacity: 100_000Bloom-like but supports delete
Membership with deletion + mergeQuotientq: 16, r: 8Mergeable filter with safe deletion
Multiset counting (how many times?)CQFq: 16, r: 8Approximate per-item multiplicity
Static set membershipXorFilterfingerprint_bits: 8Smallest footprint, fastest lookup
Set reconciliation (what's different?)IBLTcell_count: 1000Find symmetric difference between sets
Improved cardinality estimationULLp: 14~20% better accuracy than HLL at the same memory

HLL: Real-time unique visitor counting

A web analytics service needs to count unique visitors per page per day without storing every visitor ID. HLL provides bounded-memory cardinality estimation that can be merged across time windows and server instances.

defmodule Analytics.UniqueVisitors do
  alias ExDataSketch.HLL

  # Each page gets its own HLL sketch, stored in ETS or Redis
  def record_visit(page_id, visitor_id) do
    sketch = get_or_create_sketch(page_id)
    updated = HLL.update(sketch, visitor_id)
    store_sketch(page_id, updated)
  end

  def unique_count(page_id) do
    page_id
    |> get_or_create_sketch()
    |> HLL.estimate()
    |> round()
  end

  # Merge hourly sketches into a daily rollup
  def daily_rollup(page_id, date) do
    0..23
    |> Enum.map(&get_hourly_sketch(page_id, date, &1))
    |> Enum.reject(&is_nil/1)
    |> HLL.merge_many()
  end

  # Compare unique visitors across two pages
  def overlap_estimate(page_a, page_b) do
    sketch_a = get_or_create_sketch(page_a)
    sketch_b = get_or_create_sketch(page_b)
    merged = HLL.merge(sketch_a, sketch_b)

    count_a = HLL.estimate(sketch_a)
    count_b = HLL.estimate(sketch_b)
    count_union = HLL.estimate(merged)

    # Inclusion-exclusion: |A intersect B| = |A| + |B| - |A union B|
    overlap = max(count_a + count_b - count_union, 0)
    %{page_a: round(count_a), page_b: round(count_b), overlap: round(overlap)}
  end

  defp get_or_create_sketch(page_id) do
    case :ets.lookup(:hll_sketches, page_id) do
      [{_, bin}] ->
        {:ok, sketch} = HLL.deserialize(bin)
        sketch
      [] ->
        HLL.new(p: 14)
    end
  end

  defp store_sketch(page_id, sketch) do
    :ets.insert(:hll_sketches, {page_id, HLL.serialize(sketch)})
  end

  defp get_hourly_sketch(page_id, date, hour) do
    key = {page_id, date, hour}
    case :ets.lookup(:hll_hourly, key) do
      [{_, bin}] -> {:ok, s} = HLL.deserialize(bin); s
      [] -> nil
    end
  end
end

CMS: API rate limiting with frequency tracking

An API gateway needs to track request counts per API key to enforce rate limits. CMS provides constant-memory frequency estimation -- counts are always at least the true count, so rate limits are never under-enforced.

defmodule Gateway.RateLimiter do
  use GenServer

  alias ExDataSketch.CMS

  @rate_limit 1000  # max requests per window
  @window_ms 60_000  # 1-minute windows

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def check_and_record(api_key) do
    GenServer.call(__MODULE__, {:check, api_key})
  end

  def top_consumers do
    GenServer.call(__MODULE__, :top_consumers)
  end

  @impl true
  def init(_opts) do
    schedule_rotation()
    {:ok, %{sketch: CMS.new(width: 4096, depth: 5), keys_seen: MapSet.new()}}
  end

  @impl true
  def handle_call({:check, api_key}, _from, state) do
    current_count = CMS.estimate(state.sketch, api_key)

    if current_count >= @rate_limit do
      {:reply, {:error, :rate_limited, current_count}, state}
    else
      updated_sketch = CMS.update(state.sketch, api_key)
      updated_keys = MapSet.put(state.keys_seen, api_key)
      {:reply, {:ok, current_count + 1}, %{state | sketch: updated_sketch, keys_seen: updated_keys}}
    end
  end

  @impl true
  def handle_call(:top_consumers, _from, state) do
    # Check counts for all observed keys
    top =
      state.keys_seen
      |> Enum.map(fn key -> {key, CMS.estimate(state.sketch, key)} end)
      |> Enum.sort_by(fn {_k, count} -> -count end)
      |> Enum.take(10)

    {:reply, top, state}
  end

  @impl true
  def handle_info(:rotate, _state) do
    schedule_rotation()
    {:noreply, %{sketch: CMS.new(width: 4096, depth: 5), keys_seen: MapSet.new()}}
  end

  defp schedule_rotation, do: Process.send_after(self(), :rotate, @window_ms)
end

Theta: A/B test audience overlap analysis

A product team needs to measure how much overlap exists between users exposed to experiment A vs experiment B. Theta sketches support set operations (union, intersection) on cardinalities, giving overlap estimates without storing every user ID.

defmodule Experiments.OverlapAnalysis do
  alias ExDataSketch.Theta

  def analyze_overlap(experiment_a_users, experiment_b_users) do
    sketch_a = Theta.from_enumerable(experiment_a_users, k: 4096)
    sketch_b = Theta.from_enumerable(experiment_b_users, k: 4096)

    union = Theta.merge(sketch_a, sketch_b)

    count_a = Theta.estimate(sketch_a)
    count_b = Theta.estimate(sketch_b)
    count_union = Theta.estimate(union)

    # Inclusion-exclusion principle
    count_intersection = max(count_a + count_b - count_union, 0.0)

    jaccard = if count_union > 0, do: count_intersection / count_union, else: 0.0

    %{
      group_a_size: round(count_a),
      group_b_size: round(count_b),
      union_size: round(count_union),
      intersection_size: round(count_intersection),
      jaccard_similarity: Float.round(jaccard, 4),
      overlap_pct_of_a: Float.round(count_intersection / max(count_a, 1) * 100, 1),
      overlap_pct_of_b: Float.round(count_intersection / max(count_b, 1) * 100, 1)
    }
  end

  # Compare overlap across multiple experiment cohorts
  def pairwise_overlap(cohorts) when is_map(cohorts) do
    sketches =
      Map.new(cohorts, fn {name, users} ->
        {name, Theta.from_enumerable(users, k: 4096)}
      end)

    names = Map.keys(sketches)

    for a <- names, b <- names, a < b do
      union = Theta.merge(sketches[a], sketches[b])
      est_a = Theta.estimate(sketches[a])
      est_b = Theta.estimate(sketches[b])
      est_union = Theta.estimate(union)
      intersection = max(est_a + est_b - est_union, 0.0)

      {a, b, round(intersection)}
    end
  end
end

KLL: Monitoring response size distributions

A CDN needs to track the distribution of response body sizes to plan cache shard capacity. KLL provides rank-accurate quantiles -- useful when you care about "what fraction of responses are below X bytes?"

defmodule CDN.ResponseSizeMonitor do
  use GenServer

  alias ExDataSketch.KLL

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def record_response(size_bytes) do
    GenServer.cast(__MODULE__, {:record, size_bytes})
  end

  def record_batch(sizes) do
    GenServer.cast(__MODULE__, {:record_batch, sizes})
  end

  def distribution_report do
    GenServer.call(__MODULE__, :report)
  end

  @impl true
  def init(_opts) do
    {:ok, %{sketch: KLL.new(k: 200)}}
  end

  @impl true
  def handle_cast({:record, size}, state) do
    {:noreply, %{state | sketch: KLL.update(state.sketch, size)}}
  end

  @impl true
  def handle_cast({:record_batch, sizes}, state) do
    {:noreply, %{state | sketch: KLL.update_many(state.sketch, sizes)}}
  end

  @impl true
  def handle_call(:report, _from, state) do
    sketch = state.sketch
    n = KLL.count(sketch)

    report =
      if n > 0 do
        %{
          count: n,
          min: KLL.min(sketch),
          p25: KLL.quantile(sketch, 0.25),
          median: KLL.quantile(sketch, 0.5),
          p75: KLL.quantile(sketch, 0.75),
          p90: KLL.quantile(sketch, 0.90),
          p99: KLL.quantile(sketch, 0.99),
          max: KLL.max(sketch),
          # What fraction of responses are under 1 MB?
          pct_under_1mb: Float.round(KLL.rank(sketch, 1_048_576) * 100, 1)
        }
      else
        %{count: 0}
      end

    {:reply, report, state}
  end
end

# Usage:
# CDN.ResponseSizeMonitor.record_batch(response_sizes)
# CDN.ResponseSizeMonitor.distribution_report()
# => %{count: 1_500_000, median: 24576, p99: 4_194_304, pct_under_1mb: 92.3, ...}

DDSketch: SLO compliance monitoring

A platform team needs to track API latency percentiles for SLO reporting. DDSketch provides value-relative error -- when the SLO says "P99 < 200ms", DDSketch guarantees the reported P99 is within 1% of the true value, not just within a rank tolerance.

defmodule Platform.SLOMonitor do
  use GenServer

  alias ExDataSketch.DDSketch

  @slo_rules [
    {:p50, 0.5, 50},    # P50 < 50ms
    {:p95, 0.95, 150},  # P95 < 150ms
    {:p99, 0.99, 200}   # P99 < 200ms
  ]

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def record_latency(endpoint, latency_ms) do
    GenServer.cast(__MODULE__, {:record, endpoint, latency_ms})
  end

  def record_latencies(endpoint, latencies) do
    GenServer.cast(__MODULE__, {:record_batch, endpoint, latencies})
  end

  def slo_report(endpoint) do
    GenServer.call(__MODULE__, {:report, endpoint})
  end

  @impl true
  def init(_opts) do
    {:ok, %{sketches: %{}}}
  end

  @impl true
  def handle_cast({:record, endpoint, latency_ms}, state) do
    sketch = Map.get(state.sketches, endpoint, DDSketch.new(alpha: 0.01))
    updated = DDSketch.update(sketch, latency_ms)
    {:noreply, %{state | sketches: Map.put(state.sketches, endpoint, updated)}}
  end

  @impl true
  def handle_cast({:record_batch, endpoint, latencies}, state) do
    sketch = Map.get(state.sketches, endpoint, DDSketch.new(alpha: 0.01))
    updated = DDSketch.update_many(sketch, latencies)
    {:noreply, %{state | sketches: Map.put(state.sketches, endpoint, updated)}}
  end

  @impl true
  def handle_call({:report, endpoint}, _from, state) do
    case Map.fetch(state.sketches, endpoint) do
      {:ok, sketch} ->
        n = DDSketch.count(sketch)

        checks =
          Enum.map(@slo_rules, fn {label, quantile, threshold_ms} ->
            actual = DDSketch.quantile(sketch, quantile)
            %{
              metric: label,
              threshold_ms: threshold_ms,
              actual_ms: Float.round(actual, 2),
              passing: actual <= threshold_ms
            }
          end)

        all_passing = Enum.all?(checks, & &1.passing)

        report = %{
          endpoint: endpoint,
          sample_count: n,
          min_ms: Float.round(DDSketch.min(sketch), 2),
          max_ms: Float.round(DDSketch.max(sketch), 2),
          slo_checks: checks,
          compliant: all_passing
        }

        {:reply, {:ok, report}, state}

      :error ->
        {:reply, {:error, :no_data}, state}
    end
  end
end

# Usage:
# Platform.SLOMonitor.record_latencies("/api/search", latency_samples)
# Platform.SLOMonitor.slo_report("/api/search")
# => {:ok, %{compliant: true, slo_checks: [
#      %{metric: :p99, threshold_ms: 200, actual_ms: 142.37, passing: true}, ...]}}

A search engine needs to identify the most popular queries in real time to populate autocomplete suggestions and trending lists. FrequentItems (SpaceSaving) tracks the top-k items with bounded memory, providing count estimates and error bounds.

defmodule Search.TrendingQueries do
  use GenServer

  alias ExDataSketch.FrequentItems

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def record_query(query) do
    GenServer.cast(__MODULE__, {:query, query})
  end

  def record_queries(queries) do
    GenServer.cast(__MODULE__, {:queries, queries})
  end

  def trending(limit \\ 10) do
    GenServer.call(__MODULE__, {:trending, limit})
  end

  def query_estimate(query) do
    GenServer.call(__MODULE__, {:estimate, query})
  end

  @impl true
  def init(_opts) do
    {:ok, %{sketch: FrequentItems.new(k: 128)}}
  end

  @impl true
  def handle_cast({:query, query}, state) do
    normalized = query |> String.downcase() |> String.trim()
    {:noreply, %{state | sketch: FrequentItems.update(state.sketch, normalized)}}
  end

  @impl true
  def handle_cast({:queries, queries}, state) do
    normalized = Enum.map(queries, &(&1 |> String.downcase() |> String.trim()))
    {:noreply, %{state | sketch: FrequentItems.update_many(state.sketch, normalized)}}
  end

  @impl true
  def handle_call({:trending, limit}, _from, state) do
    top =
      state.sketch
      |> FrequentItems.top_k(limit: limit)
      |> Enum.map(fn entry ->
        %{
          query: entry.item,
          estimated_count: entry.count,
          min_count: entry.lower,
          max_count: entry.upper
        }
      end)

    {:reply, top, state}
  end

  @impl true
  def handle_call({:estimate, query}, _from, state) do
    normalized = query |> String.downcase() |> String.trim()
    result = FrequentItems.estimate(state.sketch, normalized)
    {:reply, result, state}
  end
end

# Usage:
# Search.TrendingQueries.record_queries(batch_of_queries)
# Search.TrendingQueries.trending(5)
# => [%{query: "elixir genserver", estimated_count: 4821, min_count: 4750, max_count: 4821},
#     %{query: "phoenix liveview", estimated_count: 3102, min_count: 3010, max_count: 3102}, ...]

ULL: Multi-node distributed cardinality with improved accuracy

An ad-tech platform counts unique ad impressions per campaign across a cluster. Each node maintains a local ULL sketch, serializes it, and sends to a central aggregator that merges and reports campaign reach. ULL is chosen over HLL for its ~20% better accuracy at the same memory -- significant when reporting to advertisers who pay per unique impression.

defmodule AdTech.CampaignReach do
  @moduledoc """
  Distributed unique impression counting per campaign using ULL sketches.

  Each application node tracks impressions locally and periodically
  flushes serialized sketches to a central aggregator for merge.
  """

  alias ExDataSketch.ULL

  # --- Per-node: local impression tracking ---

  def new_tracker(precision \\ 14) do
    %{sketches: %{}, precision: precision}
  end

  def record_impression(tracker, campaign_id, user_id) do
    sketch =
      Map.get_lazy(tracker.sketches, campaign_id, fn ->
        ULL.new(p: tracker.precision)
      end)

    updated = ULL.update(sketch, user_id)
    put_in(tracker.sketches[campaign_id], updated)
  end

  def record_impressions(tracker, campaign_id, user_ids) do
    sketch =
      Map.get_lazy(tracker.sketches, campaign_id, fn ->
        ULL.new(p: tracker.precision)
      end)

    updated = ULL.update_many(sketch, user_ids)
    put_in(tracker.sketches[campaign_id], updated)
  end

  @doc "Serialize all campaign sketches for network transport."
  def flush(tracker) do
    payloads =
      Map.new(tracker.sketches, fn {campaign_id, sketch} ->
        {campaign_id, ULL.serialize(sketch)}
      end)

    {payloads, %{tracker | sketches: %{}}}
  end

  # --- Central aggregator: merge from all nodes ---

  def new_aggregator, do: %{sketches: %{}}

  def ingest(aggregator, payloads) do
    Enum.reduce(payloads, aggregator, fn {campaign_id, binary}, acc ->
      {:ok, remote} = ULL.deserialize(binary)

      merged =
        case Map.fetch(acc.sketches, campaign_id) do
          {:ok, existing} -> ULL.merge(existing, remote)
          :error -> remote
        end

      put_in(acc.sketches[campaign_id], merged)
    end)
  end

  @doc "Report unique reach per campaign."
  def report(aggregator) do
    Map.new(aggregator.sketches, fn {campaign_id, sketch} ->
      {campaign_id, %{
        unique_impressions: ULL.estimate(sketch),
        sketch_bytes: ULL.size_bytes(sketch)
      }}
    end)
  end

  @doc "Merge hourly aggregators into a daily rollup."
  def rollup(hourly_aggregators) do
    Enum.reduce(hourly_aggregators, new_aggregator(), fn hourly, daily ->
      Enum.reduce(hourly.sketches, daily, fn {campaign_id, sketch}, acc ->
        merged =
          case Map.fetch(acc.sketches, campaign_id) do
            {:ok, existing} -> ULL.merge(existing, sketch)
            :error -> sketch
          end

        put_in(acc.sketches[campaign_id], merged)
      end)
    end)
  end
end

# Usage:
# Node A records impressions
# tracker = AdTech.CampaignReach.new_tracker(14)
# tracker = AdTech.CampaignReach.record_impressions(tracker, "camp_42", user_ids)
# {payloads, tracker} = AdTech.CampaignReach.flush(tracker)
# send(aggregator_node, {:impressions, payloads})
#
# Aggregator merges from all nodes
# agg = AdTech.CampaignReach.new_aggregator()
# agg = AdTech.CampaignReach.ingest(agg, payloads_from_node_a)
# agg = AdTech.CampaignReach.ingest(agg, payloads_from_node_b)
# AdTech.CampaignReach.report(agg)
# => %{"camp_42" => %{unique_impressions: 1_247_823, sketch_bytes: 16392}}
#
# ULL at p=14 gives ~0.65% relative error vs HLL's ~0.81%
# -- a meaningful improvement when billing per unique impression.

Bloom: Deduplication in event processing

An event pipeline receives millions of events per hour and needs to filter duplicates without storing every event ID. A Bloom filter provides space-efficient "have I seen this before?" checks -- false positives mean an occasional duplicate slips through, but no event is ever wrongly dropped.

defmodule Events.Deduplicator do
  use GenServer

  alias ExDataSketch.Bloom

  @capacity 10_000_000  # expected events per window
  @fpr 0.001            # 0.1% false positive rate

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  # Returns :new if not seen before, :duplicate if probably seen
  def check_and_insert(event_id) do
    GenServer.call(__MODULE__, {:check, event_id})
  end

  # Batch check for Broadway/Flow pipelines
  def filter_duplicates(event_ids) do
    GenServer.call(__MODULE__, {:filter_batch, event_ids})
  end

  def stats do
    GenServer.call(__MODULE__, :stats)
  end

  @impl true
  def init(_opts) do
    {:ok, %{
      bloom: Bloom.new(capacity: @capacity, false_positive_rate: @fpr),
      checked: 0,
      duplicates: 0
    }}
  end

  @impl true
  def handle_call({:check, event_id}, _from, state) do
    if Bloom.member?(state.bloom, event_id) do
      {:reply, :duplicate, %{state | checked: state.checked + 1, duplicates: state.duplicates + 1}}
    else
      updated_bloom = Bloom.put(state.bloom, event_id)
      {:reply, :new, %{state | bloom: updated_bloom, checked: state.checked + 1}}
    end
  end

  @impl true
  def handle_call({:filter_batch, event_ids}, _from, state) do
    {new_ids, bloom, dup_count} =
      Enum.reduce(event_ids, {[], state.bloom, 0}, fn id, {acc, bloom, dups} ->
        if Bloom.member?(bloom, id) do
          {acc, bloom, dups + 1}
        else
          {[id | acc], Bloom.put(bloom, id), dups}
        end
      end)

    new_state = %{state |
      bloom: bloom,
      checked: state.checked + length(event_ids),
      duplicates: state.duplicates + dup_count
    }

    {:reply, Enum.reverse(new_ids), new_state}
  end

  @impl true
  def handle_call(:stats, _from, state) do
    reply = %{
      checked: state.checked,
      duplicates: state.duplicates,
      approximate_unique: Bloom.count(state.bloom),
      filter_size_bytes: Bloom.size_bytes(state.bloom),
      configured_fpr: Bloom.error_rate(state.bloom)
    }

    {:reply, reply, state}
  end
end

# Usage in a Broadway pipeline:
defmodule MyPipeline do
  use Broadway

  @impl true
  def handle_batch(_batcher, messages, _batch_info, _context) do
    ids = Enum.map(messages, & &1.data.event_id)
    new_ids = Events.Deduplicator.filter_duplicates(ids)
    new_id_set = MapSet.new(new_ids)

    messages
    |> Enum.filter(fn msg -> MapSet.member?(new_id_set, msg.data.event_id) end)
    |> Enum.each(&process_event/1)

    messages
  end
end

Cuckoo: Session revocation cache

A session management system needs to maintain a revoked-session cache that supports both addition and removal of session IDs. Cuckoo filters support deletion, making them suitable for caches where items expire or are reinstated.

defmodule Auth.RevocationCache do
  alias ExDataSketch.Cuckoo

  def new_cache(capacity \\ 100_000) do
    Cuckoo.new(capacity: capacity, fingerprint_size: 16)
  end

  def revoke(cache, session_id) do
    case Cuckoo.put(cache, session_id) do
      {:ok, updated} -> {:ok, updated}
      {:error, :full} -> {:error, :cache_full}
    end
  end

  def reinstate(cache, session_id) do
    case Cuckoo.delete(cache, session_id) do
      {:ok, updated} -> {:ok, updated}
      {:error, :not_found} -> {:ok, cache}
    end
  end

  def revoked?(cache, session_id) do
    Cuckoo.member?(cache, session_id)
  end
end

IBLT: Replica set reconciliation

Two database replicas need to synchronize their key sets without exchanging full inventories. Each node builds an IBLT from its keys, they exchange the compact IBLTs, and subtract to find only the differing keys.

defmodule Sync.Reconciler do
  alias ExDataSketch.IBLT

  # Each node builds an IBLT from its local keys
  def build_digest(keys, cell_count \\ 3000) do
    iblt = IBLT.new(cell_count: cell_count, hash_count: 3)
    IBLT.put_many(iblt, keys)
  end

  # Find keys that differ between two nodes
  def find_differences(local_digest, remote_digest) do
    diff = IBLT.subtract(local_digest, remote_digest)

    case IBLT.list_entries(diff) do
      {:ok, %{positive: only_local, negative: only_remote}} ->
        {:ok, %{push: only_local, pull: only_remote}}

      {:error, :decode_failed} ->
        {:error, :too_many_differences}
    end
  end
end

FilterChain: Hot/cold lifecycle tiers

A membership service uses a hot Cuckoo filter for active writes and periodic compaction into a cold XorFilter for archived snapshots.

defmodule Membership.TieredFilter do
  alias ExDataSketch.{FilterChain, Cuckoo, XorFilter}

  def new_hot(capacity \\ 50_000) do
    FilterChain.new()
    |> FilterChain.add_stage(Cuckoo.new(capacity: capacity))
  end

  def ingest(chain, items) do
    Enum.reduce(items, chain, fn item, acc ->
      {:ok, updated} = FilterChain.put(acc, item)
      updated
    end)
  end

  def compact(chain, archived_items) do
    {:ok, xor} = XorFilter.build(archived_items, fingerprint_bits: 16)
    FilterChain.add_stage(chain, xor)
  end

  def seen?(chain, item) do
    FilterChain.member?(chain, item)
  end
end

Error Handling

ExDataSketch uses tagged tuples for recoverable errors:

{:error, %ExDataSketch.Errors.InvalidOptionError{message: "p must be between 4 and 16"}}
{:error, %ExDataSketch.Errors.DeserializationError{message: "invalid magic bytes"}}

Functions that cannot fail return values directly (no :ok tuple wrapping). Functions that validate external input (deserialization, option parsing) return {:ok, result} | {:error, error} tuples.

Merging in Distributed Systems

Sketches are designed for distributed aggregation. A common pattern:

  1. Each node maintains a local sketch.
  2. Periodically serialize and send sketches to an aggregator.
  3. The aggregator deserializes and merges all sketches.
  4. Query the merged sketch for global estimates.

Requirements for merging:

  • Both sketches must be the same type (e.g., both HLL).
  • Both sketches must have the same parameters (e.g., same p value for HLL).
  • Attempting to merge incompatible sketches returns an error.

Benchmarking

Run all benchmarks with mix bench, or individual benchmark files with mix run bench/<name>.exs.

Interpreting Benchee Memory Numbers

Benchee's "Memory usage" metric measures total heap allocation during the benchmarked function call. This is the sum of every byte allocated on the process heap, including transient garbage that is immediately collectible (intermediate values, short-lived binary copies, list cons cells, etc.). It is not peak memory, resident memory, or the size of the final data structure.

This distinction matters when benchmarking probabilistic sketches against exact data structures like MapSet:

  • Processing allocation is O(n) for both approaches. Every item must be processed, so both MapSet and HLL/ULL allocate proportionally to the number of items inserted. The pure-Elixir hash mixer uses 16-bit partial products that stay within the BEAM's fixnum limit, avoiding the transient bigint allocations that earlier versions incurred. When the Rust NIF is available, hashing moves entirely into native code with zero Elixir heap allocation per item.

  • Result size is where sketches win. An HLL sketch at p=12 occupies a fixed 4 KB regardless of whether 1,000 or 100,000,000 items were inserted. A MapSet must store every unique element and grows linearly with cardinality.

To measure the actual memory advantage, compare the size of the finished data structures rather than Benchee's total allocation metric:

mapset = Enum.into(data, MapSet.new())
hll = HLL.new(p: 12) |> HLL.update_many(items)

:erlang.external_size(mapset)   # grows with cardinality
:erlang.external_size(hll.state) # fixed at ~4 KB for p=12

See bench/hhl_v_naive.exs for a complete example that includes both a Benchee throughput/allocation comparison and a result-size comparison showing the fixed sketch footprint.

Backend Considerations

The Rust backend (ExDataSketch.Backend.Rust) moves batch sketch operations (update_many, merge, estimate) into a NIF for maximum throughput. When the NIF is loaded, hash64/2 also automatically uses XXHash3 for hashing, so the full pipeline runs in native code. When benchmarking, specify the backend explicitly so results are reproducible:

HLL.new(p: 12, backend: ExDataSketch.Backend.Rust)

The Pure backend processes items one at a time in update_many, avoiding intermediate list allocation but performing all arithmetic on the Elixir heap. The fixnum-safe hash mixer eliminates bigint overhead that earlier versions incurred, keeping per-item allocation to a single unavoidable 64-bit return value. The pure backend is useful for environments where the Rust NIF is not available.