Production-grade streaming data sketching algorithms for Elixir.
ExDataSketch provides probabilistic data structures for approximate counting and frequency estimation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.
Sketch Families
ExDataSketch.HLL-- HyperLogLog for cardinality (distinct count) estimation.ExDataSketch.CMS-- Count-Min Sketch for frequency estimation.ExDataSketch.Theta-- Theta Sketch for set operations on cardinalities.ExDataSketch.KLL-- KLL Sketch for rank and quantile estimation.ExDataSketch.DDSketch-- DDSketch for value-relative-accuracy quantile estimation.ExDataSketch.FrequentItems-- SpaceSaving for approximate heavy-hitter detection.ExDataSketch.Bloom-- Bloom filter for probabilistic membership testing.ExDataSketch.Cuckoo-- Cuckoo filter for membership testing with deletion support.ExDataSketch.Quotient-- Quotient filter for membership testing with deletion and merge.ExDataSketch.CQF-- Counting Quotient Filter for multiset membership with approximate counting.ExDataSketch.XorFilter-- Xor filter for static, immutable membership testing.ExDataSketch.IBLT-- Invertible Bloom Lookup Table for set reconciliation.ExDataSketch.FilterChain-- Capability-aware composition framework for membership filters.ExDataSketch.REQ-- REQ Sketch for relative error quantiles with tail accuracy.ExDataSketch.MisraGries-- Misra-Gries for deterministic heavy hitter detection.ExDataSketch.Quantiles-- Facade for quantile sketch algorithms.
Architecture
- Binary state: All sketch state is canonical Elixir binaries. No opaque NIF resources.
- Backend system: Computation is dispatched through backend modules.
ExDataSketch.Backend.Pure(pure Elixir) is always available.ExDataSketch.Backend.Rust(optional, precompiled binaries provided) provides NIF acceleration. - Serialization: ExDataSketch-native format (EXSK) for all sketches, plus Apache DataSketches interop for Theta CompactSketch.
- Deterministic hashing:
ExDataSketch.Hashprovides a stable 64-bit hash interface for reproducible results.
Quick Example
# Cardinality estimation with HLL
sketch = ExDataSketch.HLL.new(p: 14)
sketch = ExDataSketch.update_many(sketch, ["alice", "bob", "alice"])
ExDataSketch.HLL.estimate(sketch)
# Frequency estimation with CMS
sketch = ExDataSketch.CMS.new(width: 2048, depth: 5)
sketch = ExDataSketch.update_many(sketch, ["page_a", "page_a", "page_b"])
ExDataSketch.CMS.estimate(sketch, "page_a")Integration Patterns
Each sketch module provides convenience functions for ecosystem integration:
from_enumerable/2— build a sketch from anyEnumerablein one call.merge_many/1— merge a collection of sketches (e.g. from parallel workers).reducer/1— returns a 2-arity function for use withEnum.reduce/3, Flow, etc.merger/1— returns a 2-arity function for merging sketches in reduce operations.
See the Integration Guide for examples with Flow, Broadway, Explorer, Nx, and other ecosystem libraries.
See the Quick Start guide for more examples.
Summary
Functions
Updates a sketch with multiple items in a single pass.
Functions
@spec update_many( ExDataSketch.HLL.t() | ExDataSketch.CMS.t() | ExDataSketch.Theta.t() | ExDataSketch.KLL.t() | ExDataSketch.DDSketch.t() | ExDataSketch.FrequentItems.t() | ExDataSketch.Bloom.t() | ExDataSketch.Cuckoo.t() | ExDataSketch.Quotient.t() | ExDataSketch.CQF.t() | ExDataSketch.IBLT.t() | ExDataSketch.REQ.t() | ExDataSketch.MisraGries.t(), Enumerable.t() ) :: ExDataSketch.HLL.t() | ExDataSketch.CMS.t() | ExDataSketch.Theta.t() | ExDataSketch.KLL.t() | ExDataSketch.DDSketch.t() | ExDataSketch.FrequentItems.t() | ExDataSketch.Bloom.t() | ExDataSketch.Cuckoo.t() | ExDataSketch.Quotient.t() | ExDataSketch.CQF.t() | ExDataSketch.IBLT.t() | ExDataSketch.REQ.t() | ExDataSketch.MisraGries.t()
Updates a sketch with multiple items in a single pass.
Delegates to the appropriate sketch module's update_many/2 based on
the struct type.
Examples
iex> sketch = ExDataSketch.HLL.new(p: 10)
iex> sketch = ExDataSketch.update_many(sketch, ["a", "b"])
iex> ExDataSketch.HLL.estimate(sketch) > 0.0
true