ExDataSketch provides streaming probabilistic data structures for Elixir. This guide covers the basics of using HLL (cardinality estimation), CMS (frequency estimation), and Theta (set operations on cardinalities).
HyperLogLog (HLL) -- Cardinality Estimation
HLL answers: "approximately how many distinct items have I seen?"
# Create a new HLL sketch with default precision (p=14)
sketch = ExDataSketch.HLL.new()
# Add items
sketch = ExDataSketch.HLL.update(sketch, "user_123")
sketch = ExDataSketch.HLL.update(sketch, "user_456")
sketch = ExDataSketch.HLL.update(sketch, "user_123") # duplicate
# Estimate distinct count (will be approximately 2.0)
ExDataSketch.HLL.estimate(sketch)
# Add many items at once
sketch = ExDataSketch.HLL.update_many(sketch, ["a", "b", "c", "d"])Count-Min Sketch (CMS) -- Frequency Estimation
CMS answers: "approximately how many times have I seen this item?"
# Create a new CMS with default width and depth
sketch = ExDataSketch.CMS.new()
# Add items (default increment of 1)
sketch = ExDataSketch.CMS.update(sketch, "page_home")
sketch = ExDataSketch.CMS.update(sketch, "page_home")
sketch = ExDataSketch.CMS.update(sketch, "page_about")
# Estimate frequency
ExDataSketch.CMS.estimate(sketch, "page_home") # approximately 2
ExDataSketch.CMS.estimate(sketch, "page_about") # approximately 1
ExDataSketch.CMS.estimate(sketch, "page_other") # approximately 0Theta Sketch -- Set Operations on Cardinalities
Theta answers: "approximately how many distinct items, with support for set union and intersection?"
# Create a new Theta sketch with default k=4096
sketch = ExDataSketch.Theta.new()
# Add items
sketch = ExDataSketch.Theta.update(sketch, "user_123")
sketch = ExDataSketch.Theta.update(sketch, "user_456")
# Estimate distinct count
ExDataSketch.Theta.estimate(sketch)
# Add many items at once
sketch = ExDataSketch.Theta.update_many(sketch, ["a", "b", "c", "d"])
# Compact for serialization or merging
sketch = ExDataSketch.Theta.compact(sketch)Merging Sketches
HLL, CMS, and Theta all support merging, which is essential for distributed systems:
# Merge HLL sketches from different nodes
combined = ExDataSketch.HLL.merge(sketch_from_node1, sketch_from_node2)
ExDataSketch.HLL.estimate(combined)
# Merge CMS sketches
combined = ExDataSketch.CMS.merge(sketch_from_node1, sketch_from_node2)
ExDataSketch.CMS.estimate(combined, "some_item")
# Merge Theta sketches
combined = ExDataSketch.Theta.merge(sketch_from_node1, sketch_from_node2)
ExDataSketch.Theta.estimate(combined)
# Merge many sketches at once (works with all sketch types)
merged = ExDataSketch.HLL.merge_many([sketch1, sketch2, sketch3])Serialization
Sketches can be serialized to binaries for storage or transmission:
# ExDataSketch-native format (all sketch types)
binary = ExDataSketch.HLL.serialize(sketch)
{:ok, sketch} = ExDataSketch.HLL.deserialize(binary)
# Apache DataSketches CompactSketch format (Theta only, for cross-language interop)
binary = ExDataSketch.Theta.serialize_datasketches(theta_sketch)
{:ok, sketch} = ExDataSketch.Theta.deserialize_datasketches(binary)Next Steps
See the Usage Guide for detailed documentation on options, backends, serialization formats, and error handling.