All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[0.7.1] - 2026-03-22
Added
- Move hashing into NIF batch calls:
update_manyfor HLL, ULL, Theta, and CMS now sends raw items to Rust and hashes inside the NIF, eliminating per-item Elixir heap allocations (94.6% memory reduction at 10M items). (#202) - Wire
:hash_fnand:seedoptions through HLL, ULL, Theta, and CMS, enabling custom hash functions and reproducible seeded hashing. (#198) - Merge hash-compatibility validation:
merge/2on HLL, ULL, Theta, and CMS now raisesIncompatibleSketchesErrorwhen hash strategy or seed differs between sketches. (#205) - Pure backend
hll_update_manyoptimization: pre-aggregate map with sorted binary splice replaces tuple-based per-hash full-tuple copies, reducing transient allocation from O(n * m) to O(n + m). - ListIterator-based NIF item decoding for zero-copy Erlang list iteration in Rust.
- Test infrastructure: configurable coverage baselines, Rust CI coverage reporting, 39 new tests covering deserialization edge cases, custom hash_fn paths, helper functions, and merge validation.
Fixed
- Quotient filter wrap-around bug:
extract_allin both Pure and Rust backends failed when a cluster wrapped from slot N-1 to slot 0, producing nil quotients (Pure crash) or silent corruption (Rust). (#203, #204) - CMS merge validation: replaced flawed
Keyword.delete(opts, :hash_strategy)comparison with explicit width/depth/counter_width checks. - Pure backend
update_manyregression: restored chunk + batch*_update_manypath for HLL, ULL, and Theta (was incorrectly using per-item*_update). - Seed clamping: all raw NIF functions now clamp seed values to u64 range before passing to Rust.
- Hash-dependent vector tests tagged with
@tag :rust_nifto prevent failures in pure-only CI (vectors were generated with xxhash3).
[0.7.0] - 2026-03-11
Added
- UltraLogLog sketch (
ExDataSketch.ULL) for improved cardinality estimation with ~20% lower relative error than HLL at the same memory footprint. ULL1 binary state format with 8-byte header + 2^p registers. EXSK serialization (sketch ID 15). - ULL register encoding from Ertl 2023:
register_value = 2 * geometric_rank - sub_bitdoubles the information per register compared to HLL. - FGRA estimator (Ertl 2017 sigma/tau convergence) for ULL cardinality estimation.
- Rust NIF acceleration for ULL:
update_many,merge, andestimateoperations with dirty scheduler thresholds. - Precision parameter
psupports range 4..26 (vs 4..16 for HLL), allowing higher accuracy at larger memory budgets. - Full API:
new,update,update_many,merge,estimate,count,serialize,deserialize,from_enumerable,merge_many,reducer,merger,size_bytes. - ULL test vectors, parity tests, merge law property tests, and benchmark suite.
[0.6.0] - 2026-03-11
Added
- REQ sketch (
ExDataSketch.REQ) for relative-error quantile estimation with configurable high-rank accuracy. REQ1 binary state format. EXSK serialization (sketch ID 13). - Misra-Gries sketch (
ExDataSketch.MisraGries) for deterministic heavy-hitter detection with configurable key encoding (:binary,:int,{:term, :external}). MG01 binary state format. EXSK serialization (sketch ID 14). - XXHash3 NIF integration (
ExDataSketch.Hash.xxhash3_64/1,2) for fast, cross-platform stable hashing via Rust NIF with phash2-based fallback. - KLL
cdf/2andpmf/2for cumulative distribution and probability mass functions. - DDSketch
rank/2for normalized rank queries. - Rust NIF acceleration for all membership filters: Bloom, Cuckoo, Quotient, CQF, XorFilter, and IBLT. Batch operations (
put_many,merge,build) automatically use compiled Rust NIFs when available, with dirty scheduler thresholds for large inputs. - Parity tests verifying byte-identical serialization between Pure Elixir and Rust NIF backends for all sketch algorithms.
- Benchmark suites for REQ sketch, Misra-Gries, and XXHash3 NIF throughput.
Quantilesfacade for unified quantile sketch API across KLL and DDSketch.
[0.5.0] - 2026-03-10
Added
- Cuckoo filter (
ExDataSketch.Cuckoo) with Pure Elixir backend. CKO1 binary state format. Partial-key cuckoo hashing with configurable fingerprint size, bucket size, and max kicks. Supports insertion, safe deletion, and membership testing. EXSK serialization (sketch ID 8). - Quotient filter (
ExDataSketch.Quotient) with Pure Elixir backend. QOT1 binary state format. Quotient/remainder fingerprint splitting with linear probing and metadata bits (is_occupied, is_continuation, is_shifted). Supports insertion, safe deletion, merge, and membership testing. EXSK serialization (sketch ID 9). - Counting Quotient Filter (
ExDataSketch.CQF) with Pure Elixir backend. CQF1 binary state format. Extends quotient filter with variable-length counter encoding for multiset membership and approximate counting viaestimate_count/2. Supports insertion, deletion, merge. EXSK serialization (sketch ID 10). - XorFilter (
ExDataSketch.XorFilter) with Pure Elixir backend. XOR1 binary state format. Static build-once immutable filter constructed viabuild/2with 8-bit or 16-bit fingerprints. Supports membership testing only. EXSK serialization (sketch ID 11). - IBLT (
ExDataSketch.IBLT) with Pure Elixir backend. IBL1 binary state format. Invertible Bloom Lookup Table for set reconciliation viasubtract/2andlist_entries/1. Supports set mode and key-value mode, insertion, deletion, merge. EXSK serialization (sketch ID 12). - FilterChain (
ExDataSketch.FilterChain) for capability-aware membership filter composition. FCN1 binary state format. Lifecycle-tier patterns (hot/warm/cold) with stage position enforcement. Supportsadd_stage/2,put/2,member?/2,delete/2. Serializes all stages in order. - Benchmark suites for Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain (
bench/*.exs). UnsupportedOperationErrorfor operations not supported by a structure (used by FilterChain).InvalidChainCompositionErrorfor invalid FilterChain stage composition.capabilities/0function on Bloom, Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain modules.- Cuckoo, Quotient, CQF, XorFilter, IBLT, and FilterChain backend callbacks on
ExDataSketch.Backend.
[0.4.0] - 2026-03-06
Added
- Bloom filter (
ExDataSketch.Bloom) with Pure Elixir backend. - BLM1 binary state format (40-byte header + LSB-first packed bitset).
- Double hashing (Kirsch-Mitzenmacher) deriving k bit positions from a single 64-bit hash.
- Bloom backend callbacks:
bloom_new/1,bloom_put/3,bloom_put_many/3,bloom_member?/3,bloom_merge/3,bloom_count/2. - Automatic parameter derivation from capacity and false_positive_rate options.
- Bloom merge via bitwise OR with validation of matching bit_count, hash_count, and seed.
- Bloom popcount-based cardinality estimation.
- Bloom serialization via EXSK envelope (sketch ID 7).
- Bloom property tests (no false negatives, merge commutativity/associativity/identity, serialization round-trip).
- Bloom statistical validation tests (observed FPR within 2x of target).
- Bloom merge law properties in merge_laws_test.exs.
- Bloom parity test stubs for future Rust NIF backend.
- Bloom benchmark suite (
bench/bloom_bench.exs). - Bloom options section in usage guide.
[0.3.0] - 2026-03-06
Added
- FrequentItems sketch (
ExDataSketch.FrequentItems) using the SpaceSaving algorithm with Pure Elixir and Rust NIF backends. - FI1 binary state format (32-byte header + variable-length sorted entries).
- FrequentItems backend callbacks:
fi_new/1,fi_update/3,fi_update_many/3,fi_merge/3,fi_estimate/3,fi_top_k/3,fi_count/2,fi_entry_count/2. - Batch optimization via pre-aggregation (
Enum.frequencies/1) with weighted updates. - Deterministic tie-breaking on eviction (lexicographically smallest item_bytes).
- Key encoding policies:
:binary,:int(signed 64-bit LE),{:term, :external}. - Commutative merge via additive count combination and canonical replay.
- Rust NIF acceleration for
fi_update_manyandfi_mergewith dirty scheduler support. - FrequentItems support in
ExDataSketch.update_many/2facade. - FrequentItems merge law properties (commutativity, identity, count conservation).
- FrequentItems golden vector test fixtures.
- FrequentItems parity tests ensuring byte-identical output between Pure and Rust backends.
- FrequentItems benchmark suite (
bench/frequent_items_bench.exs). - EXSK codec sketch ID 6 for FrequentItems.
- FrequentItems usage documentation in usage guide with SpaceSaving algorithm overview.
- Mox test dependency for backend contract testing.
- Theta
compact/1function for explicit compaction.
[0.2.1] - 2026-03-05
Added
- DDSketch quantiles sketch (
ExDataSketch.DDSketch) with Pure Elixir and Rust NIF backends. - DDSketch backend callbacks:
ddsketch_new/1,ddsketch_update/3,ddsketch_update_many/3,ddsketch_merge/3,ddsketch_quantile/3,ddsketch_count/2,ddsketch_min/2,ddsketch_max/2. - Rust NIF acceleration for
ddsketch_update_manyandddsketch_mergewith dirty scheduler support. - DDSketch support in
ExDataSketch.Quantilesfacade (type: :ddsketch). - DDSketch merge law properties (commutativity, identity, count additivity, min/max preservation).
- DDSketch golden vector test fixtures (empty, single, small_set, merge, zeros).
- DDSketch parity tests ensuring byte-identical output between Pure and Rust backends.
- DDSketch benchmark suite (
bench/ddsketch_bench.exs). - EXSK codec sketch ID 5 for DDSketch.
- DDSketch usage documentation in usage guide with KLL vs DDSketch comparison table.
[0.2.0] - 2026-03-04
Added
- KLL quantiles sketch (
ExDataSketch.KLL) with Pure Elixir and Rust NIF backends. ExDataSketch.Quantilesfacade module for type-dispatched quantile sketch access.- KLL backend callbacks:
kll_new/1,kll_update/3,kll_update_many/3,kll_merge/3,kll_quantile/3,kll_rank/3,kll_count/2,kll_min/2,kll_max/2. - Rust NIF acceleration for
kll_update_manyandkll_mergewith dirty scheduler support. - KLL merge law properties (associativity, commutativity, identity, count additivity, min/max preservation).
- KLL golden vector test fixtures (empty, single, small_set, merge).
- KLL parity tests ensuring byte-identical output between Pure and Rust backends.
- KLL benchmark suite (
bench/kll_bench.exs). - EXSK codec sketch ID 4 for KLL.
[0.1.1] - 2026-03-04
Added
- Deterministic golden vector test fixtures for HLL, CMS, and Theta (JSON format with versioned schema).
- Pure vs Rust parity test suite ensuring byte-identical serialization and estimates.
- Merge-law property tests (associativity, commutativity, identity, chunking equivalence) for all sketch types.
- Compatibility and Stability section in README documenting serialization and parity guarantees.
- CI regression tracking for coverage and benchmark baselines.
Changed
- Stabilized benchmark scripts with deterministic datasets and JSON output.
- Clarified HLL and CMS DataSketches interop stubs as intentionally unimplemented (not "future").
- Removed stale "Phase 2" language from module documentation.
[0.1.0] - 2026-03-02
Added
- Precompiled Rust NIF binaries for macOS (ARM64, x86_64) and Linux (x86_64 gnu/musl, aarch64 gnu/musl).
- Optional Rust NIF acceleration backend (
ExDataSketch.Backend.Rust). - Rust NIFs for HLL (update_many, merge, estimate), CMS (update_many, merge), and Theta (update_many, merge).
- Normal and dirty CPU scheduler NIF variants with configurable thresholds.
- Automatic fallback to Pure Elixir backend when Rust NIF is unavailable.
- Cross-backend parity tests ensuring byte-identical output between Pure and Rust.
- Side-by-side Pure vs Rust benchmark scenarios.
- CI jobs for Rust NIF compilation and testing (
test-rust,bench-rust). - Pure Elixir Theta sketch implementation (new, update, update_many, compact, merge, estimate).
- Apache DataSketches CompactSketch codec (serialize/deserialize) for Theta interop.
- MurmurHash3 seed hash computation for DataSketches compatibility.
- Deterministic test vectors for Theta sketch.
- Cross-language vector harness specification for Java interop testing.
- Theta Benchee benchmarks.
- Pure Elixir HLL implementation (new, update, update_many, merge, estimate).
- Pure Elixir CMS implementation (new, update, update_many, merge, estimate).
- Deterministic test vectors for HLL and CMS.
- Real Benchee benchmarks for HLL and CMS.
- Project skeleton with directory structure and dependencies.
- Public API stubs for HLL, CMS, and Theta sketch modules.
- ExDataSketch-native binary codec (EXSK format).
- Hash module with stable 64-bit hash interface.
- Backend behaviour with Pure Elixir stub implementation.
- Quick Start and Usage Guide documentation.
- GitHub Actions CI workflow.
- Integration convenience functions (
from_enumerable/2,merge_many/1,reducer/1,merger/1) on all sketch modules. - Integration guide with ecosystem examples (Flow, Broadway, Explorer, Nx, ex_arrow/ExZarr).
- Documented merge properties (associativity, commutativity) for HLL, CMS, and Theta.