Configuration

Encoder Options

{:ok, encoder} = Stephen.Encoder.load(
  model: "colbert-ir/colbertv2.0",
  max_query_length: 32,
  max_doc_length: 180,
  projection_dim: 128
)

Model	Description
`colbert-ir/colbertv2.0`	Official ColBERT v2 with trained projection (recommended)
`colbert-ir/colbertv1.0`	Original ColBERT model
`bert-base-uncased`	Standard BERT (requires random projection)
Any HuggingFace BERT	Custom models work with random projection

Parameter	Default	Description
`:model`	`colbert-ir/colbertv2.0`	HuggingFace model name
`:max_query_length`	32	Query padding length
`:max_doc_length`	180	Max document tokens
`:projection_dim`	128	Output dimension (nil to disable)
`:base_module`	auto-detected	Override Bumblebee module

When loading official ColBERT models, Stephen:

For other models, Stephen initializes a random projection matrix.

embeddings = Stephen.Encoder.encode_document(encoder, text,
  skip_punctuation?: true,
  deduplicate?: true
)

Option	Default	Description
`:skip_punctuation?`	false	Filter punctuation token embeddings
`:deduplicate?`	false	Remove near-duplicate embeddings

index = Stephen.Index.new(
  embedding_dim: 128,
  space: :cosine,
  max_tokens: 100_000,
  m: 16,
  ef_construction: 200
)

plaid = Stephen.Plaid.new(
  embedding_dim: 128,
  num_centroids: 1024
)

Parameter	Default	Description
`:embedding_dim`	required	Must match encoder output
`:num_centroids`	1024	Number of clusters

index = Stephen.Index.Compressed.new(
  embedding_dim: 128,
  num_centroids: 1024,
  compression_centroids: 2048,
  residual_bits: 8
)

results = Stephen.search(encoder, index, query,
  top_k: 10,
  rerank?: true,
  candidates_per_token: 50
)

Pseudo-relevance feedback expands queries using top-ranked documents:

results = Stephen.search_with_prf(encoder, index, query,
  top_k: 10,
  feedback_docs: 3,
  expansion_tokens: 10,
  expansion_weight: 0.5
)

Option	Default	Description
`:top_k`	10	Final results to return
`:feedback_docs`	3	Documents for feedback
`:expansion_tokens`	10	Tokens to add from feedback
`:expansion_weight`	0.5	Weight for expansion vs original

Tuning tips:

Enable EXLA for GPU acceleration:

# mix.exs
{:exla, "~> 0.9"}

# config/config.exs
config :nx, default_backend: EXLA.Backend

For CPU-only with EXLA optimizations:

config :nx, default_backend: {EXLA.Backend, client: :host}

For large document sets, process in batches to manage memory:

documents
|> Stream.chunk_every(100)
|> Enum.reduce(index, fn batch, acc ->
  Stephen.index(encoder, acc, batch)
end)

For batch queries:

results = Stephen.Retriever.batch_search(encoder, index, queries, top_k: 10)