Stephen.Scorer (Stephen v1.0.0)

Implements ColBERT's late interaction scoring mechanism (MaxSim).

MaxSim computes the relevance score between a query and document by:

Computing cosine similarity between all query-document token pairs
For each query token, taking the maximum similarity to any document token
Summing these maximum similarities

This "late interaction" approach captures fine-grained token-level matching while remaining efficient for retrieval.

Summary

Types

score()

Functions

explain(query_embeddings, doc_embeddings, query_tokens, doc_tokens)

Explains the MaxSim scoring between query and document.

format_explanation(explanation, opts \\ [])

Formats an explanation for display.

fuse_and_rank(query_embeddings_list, doc_embeddings_list, strategy)

Fuses and ranks documents using multiple queries.

fuse_queries(query_embeddings_list, doc_embeddings, strategy)

Fuses scores from multiple queries using the specified strategy.

max_sim(query_embeddings, doc_embeddings)

Computes the MaxSim score between query and document embeddings.

max_sim_batch(query_embeddings, doc_embeddings_list)

Computes MaxSim scores for a query against multiple documents.

max_sim_nx(query_embeddings, doc_embeddings)

multi_max_sim(query_embeddings_list, doc_embeddings_list)

Computes MaxSim scores for multiple queries against multiple documents.

normalize(score, query_length)

Normalizes a MaxSim score to [0, 1] range.

normalize_minmax(results)

Normalizes results using min-max scaling within the result set.

normalize_results(results, query_length)

Normalizes search results to [0, 1] range.

rank(query_embeddings, doc_embeddings_list)

Ranks documents by their MaxSim scores against a query.

reciprocal_rank_fusion(ranked_lists, k \\ 60)

Reciprocal Rank Fusion (RRF) for combining multiple ranked lists.

Computes the similarity matrix between query and document tokens.

Types

score()

@type score() :: float()

Functions

explain(query_embeddings, doc_embeddings, query_tokens, doc_tokens)

@spec explain(Nx.Tensor.t(), Nx.Tensor.t(), [String.t()], [String.t()]) :: map()

Explains the MaxSim scoring between query and document.

Returns detailed information about which query tokens matched which document tokens, useful for debugging and understanding retrieval results.

Arguments

query_embeddings - Query token embeddings
doc_embeddings - Document token embeddings
query_tokens - List of query token strings
doc_tokens - List of document token strings

Returns

Map containing:

:score - Total MaxSim score
:matches - List of match details for each query token, including:
- :query_token - Query token string
- :query_index - Query token index
- :doc_token - Best matching document token string
- :doc_index - Best matching document token index
- :similarity - Cosine similarity (contribution to score)

Examples

query_emb = Encoder.encode_query(encoder, "satirical comedy")
doc_emb = Encoder.encode_document(encoder, "Colbert is satirical")
query_tokens = Encoder.tokenize(encoder, "satirical comedy", type: :query)
doc_tokens = Encoder.tokenize(encoder, "Colbert is satirical")

explanation = Scorer.explain(query_emb, doc_emb, query_tokens, doc_tokens)
# => %{
#   score: 15.2,
#   matches: [
#     %{query_token: "satirical", doc_token: "satirical", similarity: 0.95, ...},
#     %{query_token: "comedy", doc_token: "Colbert", similarity: 0.42, ...},
#     ...
#   ]
# }

format_explanation(explanation, opts \\ [])

@spec format_explanation(
  map(),
  keyword()
) :: String.t()

Formats an explanation for display.

Takes the output of explain/4 and returns a human-readable string.

Options

:top_k - Only show top-k matches by similarity (default: all)
:skip_special - Skip special tokens like [CLS], [SEP], [MASK] (default: true)
:min_similarity - Only show matches above threshold (default: 0.0)

Examples

explanation = Scorer.explain(query_emb, doc_emb, query_tokens, doc_tokens)
IO.puts(Scorer.format_explanation(explanation))
# Score: 15.20
#
# Query Token          -> Doc Token            Similarity
# --------------------------------------------------------
# satirical           -> satirical           0.95
# comedy               -> host                0.72
# ...

fuse_and_rank(query_embeddings_list, doc_embeddings_list, strategy)

@spec fuse_and_rank(
  [Nx.Tensor.t()],
  [{term(), Nx.Tensor.t()}],
  :max | :avg | {:weighted, [float()]}
) ::
  [map()]

Fuses and ranks documents using multiple queries.

Scores each document against all queries and combines using the specified fusion strategy, returning ranked results.

Arguments

query_embeddings_list - List of query embedding tensors
doc_embeddings_list - List of {doc_id, embeddings} tuples
strategy - Fusion strategy: :max, :avg, or {:weighted, weights}

Returns

List of %{doc_id: term(), score: float()} maps sorted by score descending.

Examples

queries = [query1_emb, query2_emb]
docs = [{"doc1", emb1}, {"doc2", emb2}]
results = Scorer.fuse_and_rank(queries, docs, :avg)

fuse_queries(query_embeddings_list, doc_embeddings, strategy)

@spec fuse_queries(
  [Nx.Tensor.t()],
  Nx.Tensor.t(),
  :max | :avg | {:weighted, [float()]}
) :: score()

Fuses scores from multiple queries using the specified strategy.

Combines scores from multiple query variants (e.g., query expansions, reformulations) into a single ranking.

Arguments

query_embeddings_list - List of query embedding tensors
doc_embeddings - Document embedding tensor
strategy - Fusion strategy: :max, :avg, or {:weighted, weights}

Strategies

:max - Takes the maximum score across all queries (good for OR semantics)
:avg - Averages scores across queries (good for ensemble)
{:weighted, weights} - Weighted average with custom weights per query

Examples

# Query expansion: original + synonyms
queries = [
  Encoder.encode_query(encoder, "late night host"),
  Encoder.encode_query(encoder, "talk show comedian"),
  Encoder.encode_query(encoder, "comedy television")
]
score = Scorer.fuse_queries(queries, doc_emb, :max)

# Weighted fusion: prioritize original query
score = Scorer.fuse_queries(queries, doc_emb, {:weighted, [0.6, 0.2, 0.2]})

max_sim(query_embeddings, doc_embeddings)

@spec max_sim(Nx.Tensor.t(), Nx.Tensor.t()) :: score()

Computes the MaxSim score between query and document embeddings.

Arguments

query_embeddings - Tensor of shape {query_len, dim}
doc_embeddings - Tensor of shape {doc_len, dim}

Returns

A scalar float representing the relevance score.

Examples

score = Stephen.Scorer.max_sim(query_emb, doc_emb)

max_sim_batch(query_embeddings, doc_embeddings_list)

@spec max_sim_batch(Nx.Tensor.t(), [Nx.Tensor.t()]) :: [score()]

Computes MaxSim scores for a query against multiple documents.

Arguments

query_embeddings - Tensor of shape {query_len, dim}
doc_embeddings_list - List of tensors, each of shape {doc_len, dim}

Returns

List of scores in the same order as the input documents.

max_sim_nx(query_embeddings, doc_embeddings)

multi_max_sim(query_embeddings_list, doc_embeddings_list)

@spec multi_max_sim([Nx.Tensor.t()], [Nx.Tensor.t()]) :: [[score()]]

Computes MaxSim scores for multiple queries against multiple documents.

Each query is scored against each document, returning a matrix of scores.

Arguments

query_embeddings_list - List of query tensors, each of shape {query_len, dim}
doc_embeddings_list - List of document tensors, each of shape {doc_len, dim}

Returns

List of lists where result[i][j] is the score of query i against doc j.

Examples

scores = Stephen.Scorer.multi_max_sim(queries, docs)
# scores[0][1] is score of first query against second doc

normalize(score, query_length)

@spec normalize(score(), pos_integer()) :: float()

Normalizes a MaxSim score to [0, 1] range.

Since embeddings are L2-normalized, the maximum per-token similarity is 1.0. The theoretical maximum score is therefore query_length.

Arguments

score - Raw MaxSim score
query_length - Number of query tokens used in scoring

Returns

Normalized score in [0, 1] range.

Examples

raw_score = Stephen.Scorer.max_sim(query_emb, doc_emb)
normalized = Stephen.Scorer.normalize(raw_score, 32)
# => 0.73

normalize_minmax(results)

@spec normalize_minmax([map()]) :: [map()]

Normalizes results using min-max scaling within the result set.

Scales scores so the highest is 1.0 and lowest is 0.0. Useful when you want relative ranking within results rather than absolute scores.

Arguments

results - List of %{doc_id: term(), score: float()} maps

Returns

Results with scores scaled to [0, 1] range.

Examples

results = Stephen.search(encoder, index, query)
normalized = Stephen.Scorer.normalize_minmax(results)

normalize_results(results, query_length)

@spec normalize_results([map()], pos_integer()) :: [map()]

Normalizes search results to [0, 1] range.

Takes a list of search results and normalizes their scores based on the query length. Useful for setting thresholds or comparing results across different queries.

Arguments

results - List of %{doc_id: term(), score: float()} maps
query_length - Number of query tokens used in scoring

Returns

Results with normalized scores.

Examples

results = Stephen.search(encoder, index, "late night comedy")
normalized = Stephen.Scorer.normalize_results(results, 32)
high_quality = Enum.filter(normalized, & &1.score > 0.7)

rank(query_embeddings, doc_embeddings_list)

@spec rank(Nx.Tensor.t(), [{term(), Nx.Tensor.t()}]) :: [{term(), score()}]

Ranks documents by their MaxSim scores against a query.

Arguments

query_embeddings - Tensor of shape {query_len, dim}
doc_embeddings_list - List of {doc_id, embeddings} tuples

Returns

List of {doc_id, score} tuples sorted by score descending.

reciprocal_rank_fusion(ranked_lists, k \\ 60)

@spec reciprocal_rank_fusion([[map()]], pos_integer()) :: [map()]

Reciprocal Rank Fusion (RRF) for combining multiple ranked lists.

RRF is a robust fusion method that combines rankings rather than raw scores, making it effective when score distributions differ across queries.

Arguments

ranked_lists - List of ranked result lists, each [%{doc_id: term(), score: float()}, ...]
k - Smoothing constant (default: 60). Higher values reduce the impact of top ranks.

Returns

Fused results sorted by RRF score descending.

Examples

results1 = Retriever.search_with_embeddings(query1, index)
results2 = Retriever.search_with_embeddings(query2, index)
fused = Scorer.reciprocal_rank_fusion([results1, results2])

References

Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods.

similarity_matrix(query_embeddings, doc_embeddings)

@spec similarity_matrix(Nx.Tensor.t(), Nx.Tensor.t()) :: Nx.Tensor.t()

Computes the similarity matrix between query and document tokens.

Useful for visualization and debugging.

Returns

Tensor of shape {query_len, doc_len} with cosine similarities.