Stephen.Compression (Stephen v1.0.0)

View Source

ColBERTv2-style residual compression for token embeddings.

Compresses embeddings using centroid-based representation:

  1. Learn K centroids using K-means clustering
  2. For each embedding, store centroid ID + quantized residual
  3. Achieves compression while maintaining retrieval quality

Compression Levels

Supports multiple quantization bit depths via :residual_bits:

  • residual_bits: 8 (default) - 8-bit quantization, ~4-6x compression
  • residual_bits: 2 - 2-bit quantization, ~16x compression
  • residual_bits: 1 - Binary/1-bit quantization, ~32x compression

Lower bit depths trade retrieval quality for smaller index size.

Storage Format

For 128-dim embeddings:

  • 8-bit: 2 bytes (centroid) + 128 bytes (residuals) = 130 bytes
  • 2-bit: 2 bytes (centroid) + 32 bytes (packed) = 34 bytes
  • 1-bit: 2 bytes (centroid) + 16 bytes (packed) = 18 bytes

How it works

Instead of storing full float32 embeddings (512 bytes), we store:

  • Centroid ID (2 bytes for 65536 centroids)
  • Quantized residual (packed bits)

To reconstruct: embedding ≈ centroid[id] + dequantize(residual)

Summary

Functions

Computes approximate similarity using compressed representations.

Compresses embeddings using the trained codebook.

Returns the compression ratio for given settings.

Decompresses embeddings from compressed representation.

Loads compression codebook from disk.

Saves compression codebook to disk.

Trains a compression codebook from a collection of embeddings.

Types

compressed_embedding()

@type compressed_embedding() :: %{
  centroid_ids: Nx.Tensor.t(),
  residuals: Nx.Tensor.t()
}

t()

@type t() :: %Stephen.Compression{
  centroids: Nx.Tensor.t(),
  embedding_dim: pos_integer(),
  num_centroids: pos_integer(),
  residual_bits: pos_integer()
}

Functions

approximate_similarity(compression, query_embeddings, compressed_doc)

@spec approximate_similarity(t(), Nx.Tensor.t(), compressed_embedding()) ::
  Nx.Tensor.t()

Computes approximate similarity using compressed representations.

Uses centroid lookup + residual correction for efficient scoring.

compress(compression, embeddings)

@spec compress(t(), Nx.Tensor.t()) :: compressed_embedding()

Compresses embeddings using the trained codebook.

Arguments

  • compression - Trained compression codebook
  • embeddings - Tensor of shape {n, dim} to compress

Returns

Compressed embedding struct with centroid IDs and quantized residuals.

compression_ratio(embedding_dim, residual_bits)

@spec compression_ratio(pos_integer(), pos_integer()) :: float()

Returns the compression ratio for given settings.

Examples

iex> Stephen.Compression.compression_ratio(128, 8)
3.94
iex> Stephen.Compression.compression_ratio(128, 1)
28.44

decompress(compression, compressed)

@spec decompress(t(), compressed_embedding()) :: Nx.Tensor.t()

Decompresses embeddings from compressed representation.

Arguments

  • compression - Trained compression codebook
  • compressed - Compressed embedding struct

Returns

Reconstructed embeddings tensor of shape {n, dim}.

load(path)

@spec load(Path.t()) :: {:ok, t()} | {:error, term()}

Loads compression codebook from disk.

save(compression, path)

@spec save(t(), Path.t()) :: :ok

Saves compression codebook to disk.

train(embeddings, opts \\ [])

@spec train(
  [Nx.Tensor.t()] | Nx.Tensor.t(),
  keyword()
) :: t()

Trains a compression codebook from a collection of embeddings.

Arguments

  • embeddings - List of embedding tensors or single tensor of shape {n, dim}
  • opts - Options

Options

  • :num_centroids - Number of centroids (default: 2048)
  • :residual_bits - Bits for residual quantization (default: 8)
  • :iterations - K-means iterations (default: 20)

Returns

A trained compression codebook struct.