Bloomy.Learned (Bloomy v0.1.0)
View SourceLearned Bloom Filter using machine learning to reduce false positives.
A learned bloom filter combines a machine learning model with a traditional bloom filter to achieve lower false positive rates. The ML model learns to predict set membership, and the backup bloom filter handles uncertain cases.
Concept
Based on research from "The Case for Learned Index Structures":
- Model: Neural network predicts if item is in the set
- Backup Filter: Standard bloom filter for uncertain predictions
- Query Process:
- Model predicts membership with confidence score
- If confident "not present" -> return false (skip filter)
- If present or uncertain -> check backup filter
Benefits
- Lower false positive rate than standard bloom filters
- Adapts to data patterns
- Can be more space-efficient for structured data
Trade-offs
- Requires training data
- Higher query latency (model inference)
- More complex implementation
- Best for static or slowly-changing sets
Examples
iex> # Create and train a learned bloom filter
iex> training_data = %{
iex> positive: ["user_123", "user_456", "user_789"],
iex> negative: ["user_000", "user_111", "user_222"]
iex> }
iex>
iex> filter = Bloomy.Learned.new(1000)
iex> |> Bloomy.Learned.train(training_data)
iex> |> Bloomy.Learned.add("user_123")
iex>
iex> Bloomy.Learned.member?(filter, "user_123")
true
Summary
Functions
Add an item to the learned bloom filter.
Clear the learned bloom filter.
Get information about the learned bloom filter.
Check if an item might be in the learned bloom filter.
Create a new learned bloom filter.
Train the learned bloom filter on labeled data.
Types
@type model() :: %{ weights: Nx.Tensor.t(), bias: Nx.Tensor.t(), feature_size: pos_integer() }
@type t() :: %Bloomy.Learned{ backend: atom(), backup_filter: Bloomy.Standard.t(), confidence_threshold: float(), items_count: non_neg_integer(), model: model() | nil, trained: boolean() }
Functions
Add an item to the learned bloom filter.
Adds the item to the backup bloom filter.
Parameters
filter- The learned bloom filter structitem- Item to add
Returns
Updated learned bloom filter struct.
Examples
iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.add(filter, "hello")
iex> filter.items_count
1
Clear the learned bloom filter.
Clears the backup filter but keeps the trained model.
Parameters
filter- The learned bloom filter struct
Returns
Cleared learned bloom filter struct.
Get information about the learned bloom filter.
Parameters
filter- The learned bloom filter struct
Returns
Map with filter statistics.
Examples
iex> filter = Bloomy.Learned.new(1000)
iex> info = Bloomy.Learned.info(filter)
iex> info.type
:learned
Check if an item might be in the learned bloom filter.
Uses the ML model to make an initial prediction, then checks backup filter if needed.
Parameters
filter- The learned bloom filter structitem- Item to check
Returns
Boolean indicating possible membership.
Examples
iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.add(filter, "hello")
iex> Bloomy.Learned.member?(filter, "hello")
true
Create a new learned bloom filter.
Parameters
capacity- Expected number of itemsopts- Keyword list of options::false_positive_rate- Desired false positive rate (default: 0.01):confidence_threshold- Model confidence threshold (default: 0.7):feature_size- Size of feature vector (default: 128):backend- Nx backend to use (default: Nx.default_backend())
Returns
A new Bloomy.Learned struct (untrained).
Examples
iex> filter = Bloomy.Learned.new(1000)
iex> filter.trained
false
Train the learned bloom filter on labeled data.
Parameters
filter- The learned bloom filter structtraining_data- Map with:positiveand:negativeexample listsopts- Training options::epochs- Number of training epochs (default: 10):learning_rate- Learning rate (default: 0.01)
Returns
Updated filter with trained model.
Examples
iex> training_data = %{
iex> positive: ["item1", "item2", "item3"],
iex> negative: ["other1", "other2", "other3"]
iex> }
iex> filter = Bloomy.Learned.new(1000)
iex> filter = Bloomy.Learned.train(filter, training_data)
iex> filter.trained
true