ExDataSketch (ExDataSketch v0.7.1)

Copy Markdown View Source

Production-grade streaming data sketching algorithms for Elixir.

ExDataSketch provides probabilistic data structures for approximate counting and frequency estimation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.

Sketch Families

Architecture

  • Binary state: All sketch state is canonical Elixir binaries. No opaque NIF resources.
  • Backend system: Computation is dispatched through backend modules. ExDataSketch.Backend.Pure (pure Elixir) is always available. ExDataSketch.Backend.Rust (optional, precompiled binaries provided) provides NIF acceleration.
  • Serialization: ExDataSketch-native format (EXSK) for all sketches, plus Apache DataSketches interop for Theta CompactSketch.
  • Deterministic hashing: ExDataSketch.Hash provides a stable 64-bit hash interface for reproducible results.

Quick Example

# Cardinality estimation with HLL
sketch = ExDataSketch.HLL.new(p: 14)
sketch = ExDataSketch.update_many(sketch, ["alice", "bob", "alice"])
ExDataSketch.HLL.estimate(sketch)

# Frequency estimation with CMS
sketch = ExDataSketch.CMS.new(width: 2048, depth: 5)
sketch = ExDataSketch.update_many(sketch, ["page_a", "page_a", "page_b"])
ExDataSketch.CMS.estimate(sketch, "page_a")

Integration Patterns

Each sketch module provides convenience functions for ecosystem integration:

  • from_enumerable/2 — build a sketch from any Enumerable in one call.
  • merge_many/1 — merge a collection of sketches (e.g. from parallel workers).
  • reducer/1 — returns a 2-arity function for use with Enum.reduce/3, Flow, etc.
  • merger/1 — returns a 2-arity function for merging sketches in reduce operations.

See the Integration Guide for examples with Flow, Broadway, Explorer, Nx, and other ecosystem libraries.

See the Quick Start guide for more examples.

Summary

Functions

Updates a sketch with multiple items in a single pass.

Functions

update_many(sketch, items)

Updates a sketch with multiple items in a single pass.

Delegates to the appropriate sketch module's update_many/2 based on the struct type.

Examples

iex> sketch = ExDataSketch.HLL.new(p: 10)
iex> sketch = ExDataSketch.update_many(sketch, ["a", "b"])
iex> ExDataSketch.HLL.estimate(sketch) > 0.0
true