ExDataSketch.IBLT (ExDataSketch v0.7.1)

Copy Markdown View Source

Invertible Bloom Lookup Table (IBLT) for set reconciliation.

An IBLT is a probabilistic data structure that extends Bloom filters with the ability to list its entries and subtract two IBLTs to find set differences. Unlike standard Bloom filters which only answer membership queries, IBLT supports set reconciliation where two parties each build an IBLT of their set, exchange and subtract -- the result contains only items that differ, recoverable via list_entries/1.

Modes

  • Set mode: put/2 / delete/2 for items. Value hash is 0.
  • Key-value mode: put/3 / delete/3 for key-value pairs. Both key and value are hashed to 64-bit integers.

Parameters

  • :cell_count -- number of cells (default: 1000). More cells reduce decode failure probability but increase memory.
  • :hash_count -- number of hash functions (default: 3). Typically 3-5.
  • :seed -- hash seed (default: 0).

Set Reconciliation

Two parties A and B each build an IBLT of their set. Party A sends its IBLT to B. B computes subtract(iblt_b, iblt_a) and calls list_entries/1 on the result. The positive entries are items in B but not A; negative entries are items in A but not B. Communication cost scales with the difference size, not the full set size.

Binary State Layout (IBL1)

24-byte header followed by a cell array. Each cell is 24 bytes containing: count (i32), key_sum (u64), value_sum (u64), check_sum (u32).

Summary

Functions

Returns the set of capabilities supported by IBLT.

Returns true if two IBLTs have compatible parameters.

Returns the number of items inserted.

Deletes an item in set mode (value_hash = 0).

Deletes a key-value pair in KV mode.

Deserializes an EXSK binary into an IBLT.

Creates an IBLT from an enumerable of items.

Lists entries by peeling the IBLT.

Tests whether an item may be a member.

Merges two IBLTs cell-wise (set union).

Merges a non-empty enumerable of IBLTs into one.

Returns a 2-arity merge function for combining filters.

Creates a new empty IBLT.

Inserts an item in set mode (value_hash = 0).

Inserts a key-value pair in KV mode.

Inserts multiple items in set mode in a single pass.

Returns a 2-arity reducer function for use with Enum.reduce/3.

Serializes the IBLT to the EXSK binary format.

Returns the byte size of the state binary.

Subtracts IBLT b from IBLT a cell-wise.

Types

t()

@type t() :: %ExDataSketch.IBLT{backend: module(), opts: keyword(), state: binary()}

Functions

capabilities()

Returns the set of capabilities supported by IBLT.

compatible_with?(iblt1, iblt2)

@spec compatible_with?(t(), t()) :: boolean()

Returns true if two IBLTs have compatible parameters.

Examples

iex> a = ExDataSketch.IBLT.new()
iex> b = ExDataSketch.IBLT.new()
iex> ExDataSketch.IBLT.compatible_with?(a, b)
true

count(iblt)

@spec count(t()) :: non_neg_integer()

Returns the number of items inserted.

Examples

iex> ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.count()
0

delete(iblt, item)

@spec delete(t(), term()) :: t()

Deletes an item in set mode (value_hash = 0).

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("hello")
iex> iblt = ExDataSketch.IBLT.delete(iblt, "hello")
iex> ExDataSketch.IBLT.member?(iblt, "hello")
false

delete(iblt, key, value)

@spec delete(t(), term(), term()) :: t()

Deletes a key-value pair in KV mode.

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("key", "value")
iex> iblt = ExDataSketch.IBLT.delete(iblt, "key", "value")
iex> ExDataSketch.IBLT.member?(iblt, "key")
false

deserialize(binary)

@spec deserialize(binary()) :: {:ok, t()} | {:error, Exception.t()}

Deserializes an EXSK binary into an IBLT.

Returns {:ok, iblt} on success or {:error, reason} on failure.

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("test")
iex> {:ok, recovered} = ExDataSketch.IBLT.deserialize(ExDataSketch.IBLT.serialize(iblt))
iex> ExDataSketch.IBLT.member?(recovered, "test")
true

from_enumerable(enumerable, opts \\ [])

@spec from_enumerable(
  Enumerable.t(),
  keyword()
) :: t()

Creates an IBLT from an enumerable of items.

Equivalent to new(opts) |> put_many(enumerable).

Examples

iex> iblt = ExDataSketch.IBLT.from_enumerable(["a", "b", "c"])
iex> ExDataSketch.IBLT.member?(iblt, "a")
true

list_entries(iblt)

@spec list_entries(t()) ::
  {:ok,
   %{
     positive: [{non_neg_integer(), non_neg_integer()}],
     negative: [{non_neg_integer(), non_neg_integer()}]
   }}
  | {:error, :decode_failed}

Lists entries by peeling the IBLT.

Returns {:ok, %{positive: entries, negative: entries}} on success where positive entries have count +1 and negative entries have count -1. Returns {:error, :decode_failed} if the IBLT cannot be fully decoded.

Examples

iex> iblt = ExDataSketch.IBLT.new(cell_count: 100) |> ExDataSketch.IBLT.put("hello")
iex> {:ok, entries} = ExDataSketch.IBLT.list_entries(iblt)
iex> length(entries.positive)
1

member?(iblt, item)

@spec member?(t(), term()) :: boolean()

Tests whether an item may be a member.

Returns true if all k cells for the item have non-zero count (probabilistic, may return false positives). Returns false if the item is definitely not present.

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("hello")
iex> ExDataSketch.IBLT.member?(iblt, "hello")
true

iex> iblt = ExDataSketch.IBLT.new()
iex> ExDataSketch.IBLT.member?(iblt, "hello")
false

merge(iblt, iblt)

@spec merge(t(), t()) :: t()

Merges two IBLTs cell-wise (set union).

Both IBLTs must have compatible parameters.

Examples

iex> a = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("x")
iex> b = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("y")
iex> merged = ExDataSketch.IBLT.merge(a, b)
iex> ExDataSketch.IBLT.member?(merged, "x") and ExDataSketch.IBLT.member?(merged, "y")
true

merge_many(iblts)

@spec merge_many(Enumerable.t()) :: t()

Merges a non-empty enumerable of IBLTs into one.

Examples

iex> iblts = Enum.map(1..3, fn i ->
...>   ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("item_#{i}")
...> end)
iex> merged = ExDataSketch.IBLT.merge_many(iblts)
iex> ExDataSketch.IBLT.member?(merged, "item_1")
true

merger(opts \\ [])

@spec merger(keyword()) :: (t(), t() -> t())

Returns a 2-arity merge function for combining filters.

Examples

iex> is_function(ExDataSketch.IBLT.merger(), 2)
true

new(opts \\ [])

@spec new(keyword()) :: t()

Creates a new empty IBLT.

Options

  • :cell_count -- number of cells (default: 1000). Range: 1..16_777_216.
  • :hash_count -- number of hash functions (default: 3). Range: 1..10.
  • :seed -- hash seed (default: 0).
  • :backend -- backend module (default: ExDataSketch.Backend.Pure).
  • :hash_fn -- custom hash function (term -> non_neg_integer).

Examples

iex> iblt = ExDataSketch.IBLT.new()
iex> iblt.opts[:cell_count]
1000

iex> iblt = ExDataSketch.IBLT.new(cell_count: 500, hash_count: 4)
iex> iblt.opts[:hash_count]
4

put(iblt, item)

@spec put(t(), term()) :: t()

Inserts an item in set mode (value_hash = 0).

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("hello")
iex> ExDataSketch.IBLT.member?(iblt, "hello")
true

put(iblt, key, value)

@spec put(t(), term(), term()) :: t()

Inserts a key-value pair in KV mode.

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("key", "value")
iex> ExDataSketch.IBLT.member?(iblt, "key")
true

put_many(iblt, items)

@spec put_many(t(), Enumerable.t()) :: t()

Inserts multiple items in set mode in a single pass.

Examples

iex> iblt = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put_many(["a", "b", "c"])
iex> ExDataSketch.IBLT.member?(iblt, "b")
true

reducer()

@spec reducer() :: (term(), t() -> t())

Returns a 2-arity reducer function for use with Enum.reduce/3.

Examples

iex> is_function(ExDataSketch.IBLT.reducer(), 2)
true

serialize(iblt)

@spec serialize(t()) :: binary()

Serializes the IBLT to the EXSK binary format.

Examples

iex> iblt = ExDataSketch.IBLT.new()
iex> binary = ExDataSketch.IBLT.serialize(iblt)
iex> <<"EXSK", _rest::binary>> = binary
iex> byte_size(binary) > 0
true

size_bytes(iblt)

@spec size_bytes(t()) :: non_neg_integer()

Returns the byte size of the state binary.

Examples

iex> iblt = ExDataSketch.IBLT.new()
iex> ExDataSketch.IBLT.size_bytes(iblt) > 0
true

subtract(iblt, iblt)

@spec subtract(t(), t()) :: t()

Subtracts IBLT b from IBLT a cell-wise.

The result contains the symmetric difference of the two sets. Use list_entries/1 on the result to recover the differing items.

Both IBLTs must have compatible parameters (same cell_count, hash_count, seed).

Examples

iex> a = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("x")
iex> b = ExDataSketch.IBLT.new() |> ExDataSketch.IBLT.put("y")
iex> diff = ExDataSketch.IBLT.subtract(a, b)
iex> {:ok, entries} = ExDataSketch.IBLT.list_entries(diff)
iex> length(entries.positive) + length(entries.negative) > 0
true