ExDataSketch.KLL (ExDataSketch v0.7.1)

Copy Markdown View Source

KLL (Karnin-Lang-Liberty) quantiles sketch for rank and quantile estimation.

KLL maintains multiple levels of sorted float64 items to provide approximate quantile queries with guaranteed rank accuracy. Unlike HLL, CMS, and Theta sketches, KLL operates on raw numeric values rather than hashed items.

Accuracy

The rank error bound is approximately 1.65 / k, where k is the accuracy parameter. For the default k = 200, this gives roughly 0.8% rank error.

k~Rank Error
503.30%
1001.65%
2000.83%
5000.33%

Binary State Layout (v1)

All multi-byte fields are little-endian.

Offset  Size                    Field
------  ------                  -----
0       1                       Version (u8, currently 1)
1       4                       k parameter (u32 little-endian)
5       8                       n total items seen (u64 little-endian)
13      8                       min_val (f64 little-endian, NaN = empty)
21      8                       max_val (f64 little-endian, NaN = empty)
29      1                       num_levels (u8)
30      ceil(num_levels/8)      compaction parity bits (1 bit per level)
30+P    num_levels * 4          level_sizes (u32 little-endian each)
30+P+L  sum(level_sizes) * 8    items (f64 little-endian, level 0 first)

Options

  • :k - accuracy parameter, integer 8..65535 (default: 200). Higher values use more memory but give better accuracy.
  • :backend - backend module (default: ExDataSketch.Backend.Pure).

Merge Properties

KLL merge is associative and commutative at the estimate level. Quantile query results from merged sketches are equivalent regardless of merge order, though internal state may differ due to compaction parity.

Summary

Functions

Returns the Cumulative Distribution Function (CDF) at the given split points.

Returns the total number of items inserted into the sketch.

Deserializes an EXSK binary into a KLL sketch.

Deserializes an Apache DataSketches KLL binary.

Creates a new KLL sketch from an enumerable of numeric items.

Returns the maximum value seen by the sketch, or nil if empty.

Merges two KLL sketches.

Merges a non-empty enumerable of KLL sketches into one.

Returns a 2-arity merge function suitable for combining sketches.

Returns the minimum value seen by the sketch, or nil if empty.

Creates a new KLL sketch.

Returns the Probability Mass Function (PMF) at the given split points.

Returns the approximate value at the given normalized rank.

Returns approximate values at multiple normalized ranks.

Returns the approximate normalized rank of the given value.

Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.

Serializes the sketch to the ExDataSketch-native EXSK binary format.

Serializes the sketch to Apache DataSketches KLL format.

Returns the size of the sketch state in bytes.

Updates the sketch with a single numeric value.

Updates the sketch with multiple numeric values in a single pass.

Types

t()

@type t() :: %ExDataSketch.KLL{backend: module(), opts: keyword(), state: binary()}

Functions

cdf(kll, split_points)

@spec cdf(t(), [number()]) :: [float()] | nil

Returns the Cumulative Distribution Function (CDF) at the given split points.

Given split points [s1, s2, ..., sm], returns [rank(s1), rank(s2), ..., rank(sm)]. Each value is the approximate fraction of items less than or equal to the corresponding split point.

Returns nil if the sketch is empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> cdf = ExDataSketch.KLL.cdf(sketch, [25.0, 50.0, 75.0])
iex> length(cdf)
3

count(kll)

@spec count(t()) :: non_neg_integer()

Returns the total number of items inserted into the sketch.

Examples

iex> ExDataSketch.KLL.new() |> ExDataSketch.KLL.count()
0

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into a KLL sketch.

Returns {:ok, sketch} on success or {:error, reason} on failure.

Examples

iex> ExDataSketch.KLL.deserialize(<<"invalid">>)
{:error, %ExDataSketch.Errors.DeserializationError{message: "deserialization failed: invalid magic bytes, expected EXSK"}}

deserialize_datasketches(binary)

@spec deserialize_datasketches(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an Apache DataSketches KLL binary.

Not implemented. See serialize_datasketches/1 for details.

Examples

iex> try do
...>   ExDataSketch.KLL.deserialize_datasketches(<<>>)
...> rescue
...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
...> end
"ExDataSketch.KLL.deserialize_datasketches is not yet implemented"

from_enumerable(enumerable, opts \\ [])

@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()

Creates a new KLL sketch from an enumerable of numeric items.

Equivalent to new(opts) |> update_many(enumerable).

Options

Same as new/1.

Examples

iex> sketch = ExDataSketch.KLL.from_enumerable([1.0, 2.0, 3.0], k: 200)
iex> ExDataSketch.KLL.count(sketch)
3

max_value(kll)

@spec max_value(t()) :: float() | nil

Returns the maximum value seen by the sketch, or nil if empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([3.0, 1.0, 2.0])
iex> ExDataSketch.KLL.max_value(sketch)
3.0

merge(sketch, kll)

@spec merge(t(), t()) :: t()

Merges two KLL sketches.

Both sketches must have the same k parameter. The result is a sketch whose quantile estimates approximate the union of both input multisets.

Returns the merged sketch. Raises ExDataSketch.Errors.IncompatibleSketchesError if the sketches have different parameters.

Examples

iex> a = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..50)
iex> b = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(51..100)
iex> merged = ExDataSketch.KLL.merge(a, b)
iex> ExDataSketch.KLL.count(merged)
100

merge_many(sketches)

@spec merge_many(Enumerable.t()) :: t()

Merges a non-empty enumerable of KLL sketches into one.

Raises Enum.EmptyError if the enumerable is empty.

Examples

iex> sketches = Enum.map(1..3, fn i ->
...>   ExDataSketch.KLL.new() |> ExDataSketch.KLL.update(i * 1.0)
...> end)
iex> merged = ExDataSketch.KLL.merge_many(sketches)
iex> ExDataSketch.KLL.count(merged)
3

merger(opts \\ [])

@spec merger(keyword()) :: (t(), t() -> t())

Returns a 2-arity merge function suitable for combining sketches.

The returned function calls merge/2 on two sketches.

Examples

iex> is_function(ExDataSketch.KLL.merger(), 2)
true

min_value(kll)

@spec min_value(t()) :: float() | nil

Returns the minimum value seen by the sketch, or nil if empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([3.0, 1.0, 2.0])
iex> ExDataSketch.KLL.min_value(sketch)
1.0

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new KLL sketch.

Options

  • :k - accuracy parameter, integer 8..65535 (default: 200). Higher values use more memory but give better accuracy.
  • :backend - backend module (default: ExDataSketch.Backend.Pure).

Examples

iex> sketch = ExDataSketch.KLL.new(k: 200)
iex> sketch.opts
[k: 200]
iex> ExDataSketch.KLL.count(sketch)
0

pmf(kll, split_points)

@spec pmf(t(), [number()]) :: [float()] | nil

Returns the Probability Mass Function (PMF) at the given split points.

Given split points [s1, s2, ..., sm], returns m+1 values representing the approximate fraction of items in the intervals (-inf, s1], (s1, s2], ..., (sm, +inf).

The returned values sum to 1.0.

Returns nil if the sketch is empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> pmf = ExDataSketch.KLL.pmf(sketch, [50.0])
iex> length(pmf)
2

quantile(kll, rank)

@spec quantile(t(), float()) :: float() | nil

Returns the approximate value at the given normalized rank.

The rank must be in the range [0.0, 1.0], where 0.0 is the minimum and 1.0 is the maximum. For example, quantile(sketch, 0.5) returns the approximate median.

Returns nil if the sketch is empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> median = ExDataSketch.KLL.quantile(sketch, 0.5)
iex> abs(median - 50.0) < 5.0
true

quantiles(sketch, ranks)

@spec quantiles(t(), [float()]) :: [float() | nil]

Returns approximate values at multiple normalized ranks.

Convenience wrapper around quantile/2 for batch queries.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> [q25, q50, q75] = ExDataSketch.KLL.quantiles(sketch, [0.25, 0.5, 0.75])
iex> q25 < q50 and q50 < q75
true

rank(kll, value)

@spec rank(t(), number()) :: float() | nil

Returns the approximate normalized rank of the given value.

The result is in the range [0.0, 1.0]. For example, if rank(sketch, x) returns 0.75, approximately 75% of the values in the sketch are <= x.

Returns nil if the sketch is empty.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100)
iex> r = ExDataSketch.KLL.rank(sketch, 50.0)
iex> abs(r - 0.5) < 0.05
true

reducer()

@spec reducer() :: (number(), t() -> t())

Returns a 2-arity reducer function suitable for Enum.reduce/3 and similar.

The returned function calls update/2 on each item.

Examples

iex> is_function(ExDataSketch.KLL.reducer(), 2)
true

serialize(kll)

@spec serialize(t()) :: binary()

Serializes the sketch to the ExDataSketch-native EXSK binary format.

The serialized binary includes magic bytes, version, sketch type, parameters, and state. See ExDataSketch.Codec for format details.

Examples

iex> sketch = ExDataSketch.KLL.new()
iex> binary = ExDataSketch.KLL.serialize(sketch)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

serialize_datasketches(kll)

@spec serialize_datasketches(t()) :: binary()

Serializes the sketch to Apache DataSketches KLL format.

Not implemented. Apache DataSketches KLL interop is planned for a future release. For KLL serialization, use serialize/1 (ExDataSketch-native EXSK format).

Examples

iex> try do
...>   sketch = %ExDataSketch.KLL{state: <<>>, opts: [k: 200], backend: nil}
...>   ExDataSketch.KLL.serialize_datasketches(sketch)
...> rescue
...>   e in ExDataSketch.Errors.NotImplementedError -> e.message
...> end
"ExDataSketch.KLL.serialize_datasketches is not yet implemented"

size_bytes(kll)

@spec size_bytes(t()) :: non_neg_integer()

Returns the size of the sketch state in bytes.

Examples

iex> sketch = ExDataSketch.KLL.new()
iex> ExDataSketch.KLL.size_bytes(sketch) > 0
true

update(sketch, value)

@spec update(t(), number()) :: t()

Updates the sketch with a single numeric value.

The value is converted to float64. Unlike HLL/CMS/Theta, KLL does not hash the input -- it stores the actual numeric value for quantile estimation.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update(42.0)
iex> ExDataSketch.KLL.count(sketch)
1

update_many(sketch, items)

@spec update_many(t(), Enumerable.t()) :: t()

Updates the sketch with multiple numeric values in a single pass.

More efficient than calling update/2 repeatedly because it minimizes intermediate binary allocations.

Examples

iex> sketch = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many([1.0, 2.0, 3.0])
iex> ExDataSketch.KLL.count(sketch)
3