Index Types
View SourceStephen provides three index implementations for different scale and memory requirements.
Standard Index
Stephen.Index uses HNSWLib for approximate nearest neighbor search. Best for small to medium collections with frequent updates.
index = Stephen.Index.new(
embedding_dim: 128,
space: :cosine, # or :l2
max_tokens: 100_000, # maximum token embeddings
m: 16, # HNSW M parameter
ef_construction: 200 # HNSW build quality
)
# Add documents
index = Stephen.Index.add(index, "doc1", embeddings)
# Search
candidates = Stephen.Index.search_tokens(index, query_embeddings, 50)When to Use
- Collections under ~10K documents
- Need fast add/delete/update operations
- Memory is not a primary concern
Parameters
| Parameter | Default | Description |
|---|---|---|
:embedding_dim | required | Embedding dimension |
:space | :cosine | Distance metric (:cosine or :l2) |
:max_tokens | 100,000 | Maximum token embeddings to store |
:m | 16 | HNSW graph connectivity |
:ef_construction | 200 | Index build quality |
PLAID Index
Stephen.Plaid uses centroid-based inverted lists for sub-linear search time. Best for larger collections.
plaid = Stephen.Plaid.new(
embedding_dim: 128,
num_centroids: 1024
)
# Index documents (trains centroids on first call)
plaid = Stephen.Plaid.index_documents(plaid, [
{"doc1", embeddings1},
{"doc2", embeddings2}
])
# Search
results = Stephen.Plaid.search(plaid, query_embeddings,
top_k: 10,
nprobe: 32
)How It Works
- Cluster all document token embeddings into K centroids
- Build inverted lists: centroid → [doc_ids with tokens near that centroid]
- At query time, find nearest centroids for query tokens
- Retrieve candidate docs from inverted lists
- Rerank candidates with full MaxSim
When to Use
- Collections over ~10K documents
- Search speed is critical
- Can tolerate slightly lower recall
Parameters
| Parameter | Default | Description |
|---|---|---|
:embedding_dim | required | Embedding dimension |
:num_centroids | 1024 | Number of clusters |
:nprobe | 32 | Centroids to probe per search |
Higher num_centroids improves precision but slows search. Higher nprobe improves recall but slows search.
Compressed Index
Stephen.Index.Compressed combines PLAID candidate generation with residual compression for memory efficiency.
index = Stephen.Index.Compressed.new(
embedding_dim: 128,
num_centroids: 1024,
compression_centroids: 2048,
residual_bits: 8
)
# Train compression codebook (requires sample embeddings)
index = Stephen.Index.Compressed.train(index, training_embeddings)
# Add documents (stores compressed)
index = Stephen.Index.Compressed.add(index, "doc1", embeddings)
# Search (decompresses on-the-fly)
results = Stephen.Index.Compressed.search(index, query_embeddings, top_k: 10)Compression Levels
| Bits | Compression | Quality Impact |
|---|---|---|
| 8 | ~4x | Minimal |
| 2 | ~16x | Moderate |
| 1 | ~32x | Noticeable |
When to Use
- Large collections with memory constraints
- Willing to trade some quality for memory
- Can train codebook on representative sample
Parameters
| Parameter | Default | Description |
|---|---|---|
:embedding_dim | required | Embedding dimension |
:num_centroids | 1024 | PLAID centroids |
:compression_centroids | 2048 | Compression codebook size |
:residual_bits | 8 | Quantization depth |
Dynamic Updates
All index types support add, delete, and update:
# Add
index = Index.add(index, "doc1", embeddings)
# Delete
index = Index.delete(index, "doc1")
# Update (delete + add)
index = Index.update(index, "doc1", new_embeddings)
# Batch operations
index = Index.add_all(index, [{"doc2", emb2}, {"doc3", emb3}])
index = Index.delete_all(index, ["doc2", "doc3"])Persistence
All indexes can be saved and loaded:
:ok = Stephen.Index.save(index, "/path/to/index")
{:ok, index} = Stephen.Index.load("/path/to/index")The save format includes all metadata needed to reconstruct the index.