kmv v0.1.0 KMV
K-minimum value sketch to estimate cardinality of sets.
KMV can also be used to perform set operations:
- union
- intersection
- difference
Implementation is based on the paper: “On synopses for distinct-value estimation under multiset operations” by Beyer et al, 2007.
Summary
Types
Functions
Specs
cardinality(KMV.t) :: float
Estimate the cardinality of the set in the KMV.
Returns: float
Parameters
- kmv:
KMV
Example
Insert 10,000 unique values into a new KMV of size 500, and estimate its cardinality which should be around 10,000.
iex> 1..10_000 |> Enum.into(KMV.new(500)) |> KMV.cardinality |> round
9_693
Insert a value into a KMV.
Returns: KMV
Parameters
- kmv:
KMVsketch.
Example
Create a KMV of size 3, insert 6 elements, and estimate cardinality.
iex> a = KMV.new(3)
iex> a = KMV.insert(a, "test")
iex> a = KMV.insert(a, 1)
iex> a = KMV.insert(a, 2)
iex> a = KMV.insert(a, 3)
iex> a = KMV.insert(a, :thisAlsoWorks)
iex> a = KMV.insert(a, [:a, :list, :becomes, :a, :single, :value])
iex> KMV.cardinality(a) |> round
7
Specs
intersection([KMV.t, ...]) :: float
Intersection of two KMV sketches.
Returns: float
Parameters
- kmv_list: A list of
KMV
Example
The intersection of the three KMV sketches should be around 25,000.
iex> a = 0..100_000 |> Enum.into(KMV.new(500))
iex> b = 50_000..150_000 |> Enum.into(KMV.new(500))
iex> c = 25_000..75_000 |> Enum.into(KMV.new(500))
iex> KMV.intersection([a, b, c]) |> round
25_074
Specs
new(integer) :: KMV.t
Create a new KMV of max size k.
Increasing the size of k increases the accuracy.
Returns: KMV
Parameters
- k: Max size of the KMV. Must be greater than 0.
Union a list of KMV sketches.
Returns: KMV
Parameters
- kmv_list: A list of
KMV
Example
The union the three KMV sketches around 12,000.
iex> a = 10_000..15_000 |> Enum.into(KMV.new(500))
iex> b = 13_000..18_000 |> Enum.into(KMV.new(500))
iex> c = 15_000..22_000 |> Enum.into(KMV.new(500))
iex> KMV.union([a, b, c]) |> KMV.cardinality |> round
12_840