Bloomy.Learned (Bloomy v0.1.0)

View Source

Learned Bloom Filter using machine learning to reduce false positives.

A learned bloom filter combines a machine learning model with a traditional bloom filter to achieve lower false positive rates. The ML model learns to predict set membership, and the backup bloom filter handles uncertain cases.

Concept

Based on research from "The Case for Learned Index Structures":

  1. Model: Neural network predicts if item is in the set
  2. Backup Filter: Standard bloom filter for uncertain predictions
  3. Query Process:
    • Model predicts membership with confidence score
    • If confident "not present" -> return false (skip filter)
    • If present or uncertain -> check backup filter

Benefits

  • Lower false positive rate than standard bloom filters
  • Adapts to data patterns
  • Can be more space-efficient for structured data

Trade-offs

  • Requires training data
  • Higher query latency (model inference)
  • More complex implementation
  • Best for static or slowly-changing sets

Examples

iex> # Create and train a learned bloom filter
iex> training_data = %{
iex>   positive: ["user_123", "user_456", "user_789"],
iex>   negative: ["user_000", "user_111", "user_222"]
iex> }
iex>
iex> filter = Bloomy.Learned.new(1000)
iex>   |> Bloomy.Learned.train(training_data)
iex>   |> Bloomy.Learned.add("user_123")
iex>
iex> Bloomy.Learned.member?(filter, "user_123")
true

Summary

Functions

Add an item to the learned bloom filter.

Clear the learned bloom filter.

Get information about the learned bloom filter.

Check if an item might be in the learned bloom filter.

Create a new learned bloom filter.

Train the learned bloom filter on labeled data.

Types

model()

@type model() :: %{
  weights: Nx.Tensor.t(),
  bias: Nx.Tensor.t(),
  feature_size: pos_integer()
}

t()

@type t() :: %Bloomy.Learned{
  backend: atom(),
  backup_filter: Bloomy.Standard.t(),
  confidence_threshold: float(),
  items_count: non_neg_integer(),
  model: model() | nil,
  trained: boolean()
}

Functions

add(filter, item)

Add an item to the learned bloom filter.

Adds the item to the backup bloom filter.

Parameters

  • filter - The learned bloom filter struct
  • item - Item to add

Returns

Updated learned bloom filter struct.

Examples

iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.add(filter, "hello")
iex> filter.items_count
1

clear(filter)

Clear the learned bloom filter.

Clears the backup filter but keeps the trained model.

Parameters

  • filter - The learned bloom filter struct

Returns

Cleared learned bloom filter struct.

info(filter)

Get information about the learned bloom filter.

Parameters

  • filter - The learned bloom filter struct

Returns

Map with filter statistics.

Examples

iex> filter = Bloomy.Learned.new(1000)
iex> info = Bloomy.Learned.info(filter)
iex> info.type
:learned

member?(filter, item)

Check if an item might be in the learned bloom filter.

Uses the ML model to make an initial prediction, then checks backup filter if needed.

Parameters

  • filter - The learned bloom filter struct
  • item - Item to check

Returns

Boolean indicating possible membership.

Examples

iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.add(filter, "hello")
iex> Bloomy.Learned.member?(filter, "hello")
true

new(capacity, opts \\ [])

Create a new learned bloom filter.

Parameters

  • capacity - Expected number of items
  • opts - Keyword list of options:
    • :false_positive_rate - Desired false positive rate (default: 0.01)
    • :confidence_threshold - Model confidence threshold (default: 0.7)
    • :feature_size - Size of feature vector (default: 128)
    • :backend - Nx backend to use (default: Nx.default_backend())

Returns

A new Bloomy.Learned struct (untrained).

Examples

iex> filter = Bloomy.Learned.new(1000)
iex> filter.trained
false

train(filter, training_data, opts \\ [])

Train the learned bloom filter on labeled data.

Parameters

  • filter - The learned bloom filter struct
  • training_data - Map with :positive and :negative example lists
  • opts - Training options:
    • :epochs - Number of training epochs (default: 10)
    • :learning_rate - Learning rate (default: 0.01)

Returns

Updated filter with trained model.

Examples

iex> training_data = %{
iex>   positive: ["item1", "item2", "item3"],
iex>   negative: ["other1", "other2", "other3"]
iex> }
iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.train(filter, training_data)
iex> filter.trained
true