Stephen.Compression (Stephen v1.0.0)
View SourceColBERTv2-style residual compression for token embeddings.
Compresses embeddings using centroid-based representation:
- Learn K centroids using K-means clustering
- For each embedding, store centroid ID + quantized residual
- Achieves compression while maintaining retrieval quality
Compression Levels
Supports multiple quantization bit depths via :residual_bits:
residual_bits: 8(default) - 8-bit quantization, ~4-6x compressionresidual_bits: 2- 2-bit quantization, ~16x compressionresidual_bits: 1- Binary/1-bit quantization, ~32x compression
Lower bit depths trade retrieval quality for smaller index size.
Storage Format
For 128-dim embeddings:
- 8-bit: 2 bytes (centroid) + 128 bytes (residuals) = 130 bytes
- 2-bit: 2 bytes (centroid) + 32 bytes (packed) = 34 bytes
- 1-bit: 2 bytes (centroid) + 16 bytes (packed) = 18 bytes
How it works
Instead of storing full float32 embeddings (512 bytes), we store:
- Centroid ID (2 bytes for 65536 centroids)
- Quantized residual (packed bits)
To reconstruct: embedding ≈ centroid[id] + dequantize(residual)
Summary
Functions
Computes approximate similarity using compressed representations.
Compresses embeddings using the trained codebook.
Returns the compression ratio for given settings.
Decompresses embeddings from compressed representation.
Loads compression codebook from disk.
Saves compression codebook to disk.
Trains a compression codebook from a collection of embeddings.
Types
@type compressed_embedding() :: %{ centroid_ids: Nx.Tensor.t(), residuals: Nx.Tensor.t() }
@type t() :: %Stephen.Compression{ centroids: Nx.Tensor.t(), embedding_dim: pos_integer(), num_centroids: pos_integer(), residual_bits: pos_integer() }
Functions
@spec approximate_similarity(t(), Nx.Tensor.t(), compressed_embedding()) :: Nx.Tensor.t()
Computes approximate similarity using compressed representations.
Uses centroid lookup + residual correction for efficient scoring.
@spec compress(t(), Nx.Tensor.t()) :: compressed_embedding()
Compresses embeddings using the trained codebook.
Arguments
compression- Trained compression codebookembeddings- Tensor of shape {n, dim} to compress
Returns
Compressed embedding struct with centroid IDs and quantized residuals.
@spec compression_ratio(pos_integer(), pos_integer()) :: float()
Returns the compression ratio for given settings.
Examples
iex> Stephen.Compression.compression_ratio(128, 8)
3.94
iex> Stephen.Compression.compression_ratio(128, 1)
28.44
@spec decompress(t(), compressed_embedding()) :: Nx.Tensor.t()
Decompresses embeddings from compressed representation.
Arguments
compression- Trained compression codebookcompressed- Compressed embedding struct
Returns
Reconstructed embeddings tensor of shape {n, dim}.
Loads compression codebook from disk.
Saves compression codebook to disk.
@spec train( [Nx.Tensor.t()] | Nx.Tensor.t(), keyword() ) :: t()
Trains a compression codebook from a collection of embeddings.
Arguments
embeddings- List of embedding tensors or single tensor of shape {n, dim}opts- Options
Options
:num_centroids- Number of centroids (default: 2048):residual_bits- Bits for residual quantization (default: 8):iterations- K-means iterations (default: 20)
Returns
A trained compression codebook struct.