gauzy - Probabilistic Set Membership Filters for Gleam

Package Version Hex Docs

Ever need to quickly check if you’ve seen an item before, without storing every single item? That’s where gauzy comes in!

gauzy provides probabilistic set membership filters for Gleam. Think of them as super memory-efficient ways to ask

Currently includes: Bloom filters (more filter types planned!)


Installation

gleam add gauzy@1

Probabilistic data structures are specialized data structures that use randomization to achieve compact representation with controlled error rates. They’re particularly useful when:

Available Data Structures

Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set.

Key properties:

import gauzy/bloom_filter
import murmur3a
import mumu

pub fn main() {
  // Use your own hash functions here. They:
  // - must output an integer hash digest.
  // - should ideally be independent and uniformly distributed
  // - are not required to be cryptographic
  let hash_fn_1 = fn(string_item) {
    murmur3a.hash_string(string_item, 0) |> murmur3a.int_digest
  }
  let hash_fn_2 = fn(string_item) {
    mumu.hash(string_item)
  }
  let assert Ok(hash_fn_pair) =
    bloom_filter.new_hash_fn_pair(hash_fn_1, hash_fn_2)

  // Expect to store ~10,000 items, allow 1% error rate
  let assert Ok(filter) = bloom_filter.new(
    capacity: 10_000,
    target_error_rate: 0.01,
    with_hashes: hash_fn_pair,
  )

  // Insert items
  let assert Ok(filter) = bloom_filter.try_insert(filter, "hello")
  let assert Ok(filter) = bloom_filter.try_insert(filter, "world")

  // Check if items might be in the set
  let in_filter = bloom_filter.might_contain(filter, "hello")  // True
  let also_in_filter = bloom_filter.might_contain(filter, "world")  // True
  let not_in_filter = bloom_filter.might_contain(filter, "goodbye")  // False

  // Get filter properties
  let size = bloom_filter.bit_size(filter)
  let error_rate = bloom_filter.false_positive_rate(filter)
  let hash_count = bloom_filter.hash_fn_count(filter)

  // Estimate how many unique items were inserted
  let estimate = bloom_filter.estimate_cardinality(filter)

  // Returns an equivalent empty filter
  let empty_filter = bloom_filter.reset(filter)
}

Further documentation can be found at https://hexdocs.pm/gauzy.

Development

gleam test  # Run the tests

Error Handling

All creation and insertion operations return a Result. Possible errors:

Check/handle errors using Gleam’s Result type.


Contributing


Further Information


Search Document