Stephen.Scorer (Stephen v1.0.0)
View SourceImplements ColBERT's late interaction scoring mechanism (MaxSim).
MaxSim computes the relevance score between a query and document by:
- Computing cosine similarity between all query-document token pairs
- For each query token, taking the maximum similarity to any document token
- Summing these maximum similarities
This "late interaction" approach captures fine-grained token-level matching while remaining efficient for retrieval.
Summary
Functions
Explains the MaxSim scoring between query and document.
Formats an explanation for display.
Fuses and ranks documents using multiple queries.
Fuses scores from multiple queries using the specified strategy.
Computes the MaxSim score between query and document embeddings.
Computes MaxSim scores for a query against multiple documents.
Computes MaxSim scores for multiple queries against multiple documents.
Normalizes a MaxSim score to [0, 1] range.
Normalizes results using min-max scaling within the result set.
Normalizes search results to [0, 1] range.
Ranks documents by their MaxSim scores against a query.
Reciprocal Rank Fusion (RRF) for combining multiple ranked lists.
Computes the similarity matrix between query and document tokens.
Types
@type score() :: float()
Functions
@spec explain(Nx.Tensor.t(), Nx.Tensor.t(), [String.t()], [String.t()]) :: map()
Explains the MaxSim scoring between query and document.
Returns detailed information about which query tokens matched which document tokens, useful for debugging and understanding retrieval results.
Arguments
query_embeddings- Query token embeddingsdoc_embeddings- Document token embeddingsquery_tokens- List of query token stringsdoc_tokens- List of document token strings
Returns
Map containing:
:score- Total MaxSim score:matches- List of match details for each query token, including::query_token- Query token string:query_index- Query token index:doc_token- Best matching document token string:doc_index- Best matching document token index:similarity- Cosine similarity (contribution to score)
Examples
query_emb = Encoder.encode_query(encoder, "satirical comedy")
doc_emb = Encoder.encode_document(encoder, "Colbert is satirical")
query_tokens = Encoder.tokenize(encoder, "satirical comedy", type: :query)
doc_tokens = Encoder.tokenize(encoder, "Colbert is satirical")
explanation = Scorer.explain(query_emb, doc_emb, query_tokens, doc_tokens)
# => %{
# score: 15.2,
# matches: [
# %{query_token: "satirical", doc_token: "satirical", similarity: 0.95, ...},
# %{query_token: "comedy", doc_token: "Colbert", similarity: 0.42, ...},
# ...
# ]
# }
Formats an explanation for display.
Takes the output of explain/4 and returns a human-readable string.
Options
:top_k- Only show top-k matches by similarity (default: all):skip_special- Skip special tokens like [CLS], [SEP], [MASK] (default: true):min_similarity- Only show matches above threshold (default: 0.0)
Examples
explanation = Scorer.explain(query_emb, doc_emb, query_tokens, doc_tokens)
IO.puts(Scorer.format_explanation(explanation))
# Score: 15.20
#
# Query Token -> Doc Token Similarity
# --------------------------------------------------------
# satirical -> satirical 0.95
# comedy -> host 0.72
# ...
@spec fuse_and_rank( [Nx.Tensor.t()], [{term(), Nx.Tensor.t()}], :max | :avg | {:weighted, [float()]} ) :: [map()]
Fuses and ranks documents using multiple queries.
Scores each document against all queries and combines using the specified fusion strategy, returning ranked results.
Arguments
query_embeddings_list- List of query embedding tensorsdoc_embeddings_list- List of{doc_id, embeddings}tuplesstrategy- Fusion strategy::max,:avg, or{:weighted, weights}
Returns
List of %{doc_id: term(), score: float()} maps sorted by score descending.
Examples
queries = [query1_emb, query2_emb]
docs = [{"doc1", emb1}, {"doc2", emb2}]
results = Scorer.fuse_and_rank(queries, docs, :avg)
@spec fuse_queries( [Nx.Tensor.t()], Nx.Tensor.t(), :max | :avg | {:weighted, [float()]} ) :: score()
Fuses scores from multiple queries using the specified strategy.
Combines scores from multiple query variants (e.g., query expansions, reformulations) into a single ranking.
Arguments
query_embeddings_list- List of query embedding tensorsdoc_embeddings- Document embedding tensorstrategy- Fusion strategy::max,:avg, or{:weighted, weights}
Strategies
:max- Takes the maximum score across all queries (good for OR semantics):avg- Averages scores across queries (good for ensemble){:weighted, weights}- Weighted average with custom weights per query
Examples
# Query expansion: original + synonyms
queries = [
Encoder.encode_query(encoder, "late night host"),
Encoder.encode_query(encoder, "talk show comedian"),
Encoder.encode_query(encoder, "comedy television")
]
score = Scorer.fuse_queries(queries, doc_emb, :max)
# Weighted fusion: prioritize original query
score = Scorer.fuse_queries(queries, doc_emb, {:weighted, [0.6, 0.2, 0.2]})
@spec max_sim(Nx.Tensor.t(), Nx.Tensor.t()) :: score()
Computes the MaxSim score between query and document embeddings.
Arguments
query_embeddings- Tensor of shape {query_len, dim}doc_embeddings- Tensor of shape {doc_len, dim}
Returns
A scalar float representing the relevance score.
Examples
score = Stephen.Scorer.max_sim(query_emb, doc_emb)
@spec max_sim_batch(Nx.Tensor.t(), [Nx.Tensor.t()]) :: [score()]
Computes MaxSim scores for a query against multiple documents.
Arguments
query_embeddings- Tensor of shape {query_len, dim}doc_embeddings_list- List of tensors, each of shape {doc_len, dim}
Returns
List of scores in the same order as the input documents.
@spec multi_max_sim([Nx.Tensor.t()], [Nx.Tensor.t()]) :: [[score()]]
Computes MaxSim scores for multiple queries against multiple documents.
Each query is scored against each document, returning a matrix of scores.
Arguments
query_embeddings_list- List of query tensors, each of shape {query_len, dim}doc_embeddings_list- List of document tensors, each of shape {doc_len, dim}
Returns
List of lists where result[i][j] is the score of query i against doc j.
Examples
scores = Stephen.Scorer.multi_max_sim(queries, docs)
# scores[0][1] is score of first query against second doc
@spec normalize(score(), pos_integer()) :: float()
Normalizes a MaxSim score to [0, 1] range.
Since embeddings are L2-normalized, the maximum per-token similarity is 1.0.
The theoretical maximum score is therefore query_length.
Arguments
score- Raw MaxSim scorequery_length- Number of query tokens used in scoring
Returns
Normalized score in [0, 1] range.
Examples
raw_score = Stephen.Scorer.max_sim(query_emb, doc_emb)
normalized = Stephen.Scorer.normalize(raw_score, 32)
# => 0.73
Normalizes results using min-max scaling within the result set.
Scales scores so the highest is 1.0 and lowest is 0.0. Useful when you want relative ranking within results rather than absolute scores.
Arguments
results- List of%{doc_id: term(), score: float()}maps
Returns
Results with scores scaled to [0, 1] range.
Examples
results = Stephen.search(encoder, index, query)
normalized = Stephen.Scorer.normalize_minmax(results)
@spec normalize_results([map()], pos_integer()) :: [map()]
Normalizes search results to [0, 1] range.
Takes a list of search results and normalizes their scores based on the query length. Useful for setting thresholds or comparing results across different queries.
Arguments
results- List of%{doc_id: term(), score: float()}mapsquery_length- Number of query tokens used in scoring
Returns
Results with normalized scores.
Examples
results = Stephen.search(encoder, index, "late night comedy")
normalized = Stephen.Scorer.normalize_results(results, 32)
high_quality = Enum.filter(normalized, & &1.score > 0.7)
@spec rank(Nx.Tensor.t(), [{term(), Nx.Tensor.t()}]) :: [{term(), score()}]
Ranks documents by their MaxSim scores against a query.
Arguments
query_embeddings- Tensor of shape {query_len, dim}doc_embeddings_list- List of {doc_id, embeddings} tuples
Returns
List of {doc_id, score} tuples sorted by score descending.
@spec reciprocal_rank_fusion([[map()]], pos_integer()) :: [map()]
Reciprocal Rank Fusion (RRF) for combining multiple ranked lists.
RRF is a robust fusion method that combines rankings rather than raw scores, making it effective when score distributions differ across queries.
Arguments
ranked_lists- List of ranked result lists, each[%{doc_id: term(), score: float()}, ...]k- Smoothing constant (default: 60). Higher values reduce the impact of top ranks.
Returns
Fused results sorted by RRF score descending.
Examples
results1 = Retriever.search_with_embeddings(query1, index)
results2 = Retriever.search_with_embeddings(query2, index)
fused = Scorer.reciprocal_rank_fusion([results1, results2])References
Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods.
@spec similarity_matrix(Nx.Tensor.t(), Nx.Tensor.t()) :: Nx.Tensor.t()
Computes the similarity matrix between query and document tokens.
Useful for visualization and debugging.
Returns
Tensor of shape {query_len, doc_len} with cosine similarities.