KLL (Karnin-Lang-Liberty) quantiles sketch for rank and quantile estimation.
KLL maintains multiple levels of sorted float64 items to provide approximate quantile queries with guaranteed rank accuracy. Unlike HLL, CMS, and Theta sketches, KLL operates on raw numeric values rather than hashed items.
Accuracy
The rank error bound is approximately 1.65 / k, where k is the accuracy
parameter. For the default k = 200, this gives roughly 0.8% rank error.
| k | ~Rank Error |
|---|---|
| 50 | 3.30% |
| 100 | 1.65% |
| 200 | 0.83% |
| 500 | 0.33% |
Binary State Layout (v1)
All multi-byte fields are little-endian.
Offset Size Field
------ ------ -----
0 1 Version (u8, currently 1)
1 4 k parameter (u32 little-endian)
5 8 n total items seen (u64 little-endian)
13 8 min_val (f64 little-endian, NaN = empty)
21 8 max_val (f64 little-endian, NaN = empty)
29 1 num_levels (u8)
30 ceil(num_levels/8) compaction parity bits (1 bit per level)
30+P num_levels * 4 level_sizes (u32 little-endian each)
30+P+L sum(level_sizes) * 8 items (f64 little-endian, level 0 first)Options
:k- accuracy parameter, integer 8..65535 (default: 200). Higher values use more memory but give better accuracy.:backend- backend module (default:ExDataSketch.Backend.Pure).
Merge Properties
KLL merge is associative and commutative at the estimate level. Quantile query results from merged sketches are equivalent regardless of merge order, though internal state may differ due to compaction parity.
Summary
Functions
Returns the Cumulative Distribution Function (CDF) at the given split points.
Returns the total number of items inserted into the sketch.
Deserializes an EXSK binary into a KLL sketch.
Deserializes an Apache DataSketches KLL binary.
Creates a new KLL sketch from an enumerable of numeric items.
Returns the maximum value seen by the sketch, or nil if empty.
Merges two KLL sketches.
Merges a non-empty enumerable of KLL sketches into one.
Returns a 2-arity merge function suitable for combining sketches.
Returns the minimum value seen by the sketch, or nil if empty.
Creates a new KLL sketch.
Returns the Probability Mass Function (PMF) at the given split points.
Returns the approximate value at the given normalized rank.
Returns approximate values at multiple normalized ranks.
Returns the approximate normalized rank of the given value.
Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.
Serializes the sketch to the ExDataSketch-native EXSK binary format.
Serializes the sketch to Apache DataSketches KLL format.
Returns the size of the sketch state in bytes.
Updates the sketch with a single numeric value.
Updates the sketch with multiple numeric values in a single pass.
Types
Functions
Returns the Cumulative Distribution Function (CDF) at the given split points.
Given split points [s1, s2, ..., sm], returns [rank(s1), rank(s2), ..., rank(sm)].
Each value is the approximate fraction of items less than or equal to the
corresponding split point.
Returns nil if the sketch is empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> cdf = ExDataSketch.KLL.cdf(sketch, [25.0, 50.0, 75.0])
iex> length(cdf)
3
@spec count(t()) :: non_neg_integer()
Returns the total number of items inserted into the sketch.
Examples
iex> ExDataSketch.KLL.new() |> ExDataSketch.KLL.count()
0
@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an EXSK binary into a KLL sketch.
Returns {:ok, sketch} on success or {:error, reason} on failure.
Examples
iex> ExDataSketch.KLL.deserialize(<<"invalid">>)
{:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}
@spec deserialize_datasketches(binary()) :: {:ok, t()} | {:error, Exception.t()}
Deserializes an Apache DataSketches KLL binary.
Not implemented. See serialize_datasketches/1 for details.
Examples
iex> try do
...> ExDataSketch.KLL.deserialize_datasketches(<<>>)
...> rescue
...> e in ExDataSketch.Errors.NotImplementedError -> e.message
...> end
"ExDataSketch.KLL.deserialize_datasketches is not yet implemented"
@spec from_enumerable( Enumerable.t(), keyword() ) :: t()
Creates a new KLL sketch from an enumerable of numeric items.
Equivalent to new(opts) |> update_many(enumerable).
Options
Same as new/1.
Examples
iex> sketch = ExDataSketch.KLL.from_enumerable([1.0, 2.0, 3.0], k: 200)
iex> ExDataSketch.KLL.count(sketch)
3
Returns the maximum value seen by the sketch, or nil if empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([3.0, 1.0, 2.0])
iex> ExDataSketch.KLL.max_value(sketch)
3.0
Merges two KLL sketches.
Both sketches must have the same k parameter. The result is a sketch
whose quantile estimates approximate the union of both input multisets.
Returns the merged sketch. Raises ExDataSketch.Errors.IncompatibleSketchesError
if the sketches have different parameters.
Examples
iex> a = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..50)
iex> b = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(51..100)
iex> merged = ExDataSketch.KLL.merge(a, b)
iex> ExDataSketch.KLL.count(merged)
100
@spec merge_many(Enumerable.t()) :: t()
Merges a non-empty enumerable of KLL sketches into one.
Raises Enum.EmptyError if the enumerable is empty.
Examples
iex> sketches = Enum.map(1..3, fn i ->
...> ExDataSketch.KLL.new() |> ExDataSketch.KLL.update(i * 1.0)
...> end)
iex> merged = ExDataSketch.KLL.merge_many(sketches)
iex> ExDataSketch.KLL.count(merged)
3
Returns a 2-arity merge function suitable for combining sketches.
The returned function calls merge/2 on two sketches.
Examples
iex> is_function(ExDataSketch.KLL.merger(), 2)
true
Returns the minimum value seen by the sketch, or nil if empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([3.0, 1.0, 2.0])
iex> ExDataSketch.KLL.min_value(sketch)
1.0
Creates a new KLL sketch.
Options
:k- accuracy parameter, integer 8..65535 (default: 200). Higher values use more memory but give better accuracy.:backend- backend module (default:ExDataSketch.Backend.Pure).
Examples
iex> sketch = ExDataSketch.KLL.new(k: 200)
iex> sketch.opts
[k: 200]
iex> ExDataSketch.KLL.count(sketch)
0
Returns the Probability Mass Function (PMF) at the given split points.
Given split points [s1, s2, ..., sm], returns m+1 values representing
the approximate fraction of items in the intervals
(-inf, s1], (s1, s2], ..., (sm, +inf).
The returned values sum to 1.0.
Returns nil if the sketch is empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> pmf = ExDataSketch.KLL.pmf(sketch, [50.0])
iex> length(pmf)
2
Returns the approximate value at the given normalized rank.
The rank must be in the range [0.0, 1.0], where 0.0 is the minimum
and 1.0 is the maximum. For example, quantile(sketch, 0.5) returns
the approximate median.
Returns nil if the sketch is empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> median = ExDataSketch.KLL.quantile(sketch, 0.5)
iex> abs(median - 50.0) < 5.0
true
Returns approximate values at multiple normalized ranks.
Convenience wrapper around quantile/2 for batch queries.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> [q25, q50, q75] = ExDataSketch.KLL.quantiles(sketch, [0.25, 0.5, 0.75])
iex> q25 < q50 and q50 < q75
true
Returns the approximate normalized rank of the given value.
The result is in the range [0.0, 1.0]. For example, if rank(sketch, x)
returns 0.75, approximately 75% of the values in the sketch are <= x.
Returns nil if the sketch is empty.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> r = ExDataSketch.KLL.rank(sketch, 50.0)
iex> abs(r - 0.5) < 0.05
true
Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.
The returned function calls update/2 on each item.
Examples
iex> is_function(ExDataSketch.KLL.reducer(), 2)
true
Serializes the sketch to the ExDataSketch-native EXSK binary format.
The serialized binary includes magic bytes, version, sketch type,
parameters, and state. See ExDataSketch.Codec for format details.
Examples
iex> sketch = ExDataSketch.KLL.new()
iex> binary = ExDataSketch.KLL.serialize(sketch)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true
Serializes the sketch to Apache DataSketches KLL format.
Not implemented. Apache DataSketches KLL interop is planned for a future
release. For KLL serialization, use serialize/1 (ExDataSketch-native
EXSK format).
Examples
iex> try do
...> sketch = %ExDataSketch.KLL{state: <<>>, opts: [k: 200], backend: nil}
...> ExDataSketch.KLL.serialize_datasketches(sketch)
...> rescue
...> e in ExDataSketch.Errors.NotImplementedError -> e.message
...> end
"ExDataSketch.KLL.serialize_datasketches is not yet implemented"
@spec size_bytes(t()) :: non_neg_integer()
Returns the size of the sketch state in bytes.
Examples
iex> sketch = ExDataSketch.KLL.new()
iex> ExDataSketch.KLL.size_bytes(sketch) > 0
true
Updates the sketch with a single numeric value.
The value is converted to float64. Unlike HLL/CMS/Theta, KLL does not hash the input -- it stores the actual numeric value for quantile estimation.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update(42.0)
iex> ExDataSketch.KLL.count(sketch)
1
@spec update_many(t(), Enumerable.t()) :: t()
Updates the sketch with multiple numeric values in a single pass.
More efficient than calling update/2 repeatedly because it minimizes
intermediate binary allocations.
Examples
iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([1.0, 2.0, 3.0])
iex> ExDataSketch.KLL.count(sketch)
3