kmv v0.1.0 KMV

K-minimum value sketch to estimate cardinality of sets.

KMV can also be used to perform set operations:

  • union
  • intersection
  • difference

Implementation is based on the paper: “On synopses for distinct-value estimation under multiset operations” by Beyer et al, 2007.

Summary

Functions

Estimate the cardinality of the set in the KMV

Insert a value into a KMV

Intersection of two KMV sketches

Create a new KMV of max size k

Union a list of KMV sketches

Types

t :: %KMV{heap: Heap.t, max_size: integer, set: Set.t, size: integer}

Functions

cardinality(kmv)

Specs

cardinality(KMV.t) :: float

Estimate the cardinality of the set in the KMV.

Returns: float

Parameters

Example

Insert 10,000 unique values into a new KMV of size 500, and estimate its cardinality which should be around 10,000.

iex> 1..10_000 |> Enum.into(KMV.new(500)) |> KMV.cardinality |> round
9_693
insert(kmv, value)

Specs

insert(KMV.t, any) :: KMV.t

Insert a value into a KMV.

Returns: KMV

Parameters

  • kmv: KMV sketch.

Example

Create a KMV of size 3, insert 6 elements, and estimate cardinality.

iex> a = KMV.new(3)
iex> a = KMV.insert(a, "test")
iex> a = KMV.insert(a, 1)
iex> a = KMV.insert(a, 2)
iex> a = KMV.insert(a, 3)
iex> a = KMV.insert(a, :thisAlsoWorks)
iex> a = KMV.insert(a, [:a, :list, :becomes, :a, :single, :value])
iex> KMV.cardinality(a) |> round
7
intersection(kmv_list)

Specs

intersection([KMV.t, ...]) :: float

Intersection of two KMV sketches.

Returns: float

Parameters

  • kmv_list: A list of KMV

Example

The intersection of the three KMV sketches should be around 25,000.

iex> a =      0..100_000 |> Enum.into(KMV.new(500))
iex> b = 50_000..150_000 |> Enum.into(KMV.new(500))
iex> c = 25_000..75_000 |> Enum.into(KMV.new(500))
iex> KMV.intersection([a, b, c]) |> round
25_074
new(k \\ 1024)

Specs

new(integer) :: KMV.t

Create a new KMV of max size k.

Increasing the size of k increases the accuracy.

Returns: KMV

Parameters

  • k: Max size of the KMV. Must be greater than 0.
union(kmv_list)

Specs

union([KMV.t, ...]) :: KMV.t

Union a list of KMV sketches.

Returns: KMV

Parameters

  • kmv_list: A list of KMV

Example

The union the three KMV sketches around 12,000.

iex> a = 10_000..15_000 |> Enum.into(KMV.new(500))
iex> b = 13_000..18_000 |> Enum.into(KMV.new(500))
iex> c = 15_000..22_000 |> Enum.into(KMV.new(500))
iex> KMV.union([a, b, c]) |> KMV.cardinality |> round
12_840