Nasty.Semantic.Coreference.Clusterer (Nasty v0.3.0)

View Source

Generic clustering module for coreference resolution.

Builds coreference chains from mentions using agglomerative clustering:

  1. Start with each mention in its own cluster
  2. Iteratively merge the best-scoring cluster pair
  3. Continue until no pairs score above threshold

Supports different merge strategies (average, best, worst linkage).

Summary

Functions

Builds coreference chains from mentions.

Finds the best pair of clusters to merge.

Merges clusters iteratively until no more merges are possible.

Selects the representative mention for a cluster.

Functions

build_chains(mentions, opts \\ [])

Builds coreference chains from mentions.

Uses agglomerative clustering to group mentions that likely refer to the same entity.

Parameters

  • mentions - List of all mentions from document
  • opts - Clustering options
    • :min_score - Minimum score threshold for merging (default: 0.3)
    • :max_distance - Maximum sentence distance (default: 3)
    • :merge_strategy - Linkage type (default: :average)
    • :weights - Custom scoring weights

Returns

List of CorefChain structs, each containing mentions referring to same entity. Chains with only 1 mention are filtered out.

Examples

iex> mentions = [m1, m2, m3, m4]
iex> chains = Clusterer.build_chains(mentions, min_score: 0.3)
[
  %CorefChain{mentions: [m1, m2], representative: "John"},
  %CorefChain{mentions: [m3, m4], representative: "the cat"}
]

find_best_merge(clusters, opts, min_score)

@spec find_best_merge([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) ::
  {:ok, {non_neg_integer(), non_neg_integer()}} | :none

Finds the best pair of clusters to merge.

Scores all cluster pairs and returns indices of the pair with highest score above the minimum threshold.

Returns {:ok, {idx1, idx2}} or :none if no valid merge exists.

merge_clusters(clusters, opts, min_score)

@spec merge_clusters([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) :: [
  [Nasty.AST.Semantic.Mention.t()]
]

Merges clusters iteratively until no more merges are possible.

Finds the best-scoring cluster pair at each iteration and merges them. Stops when no pair scores above min_score threshold.

select_representative(cluster)

@spec select_representative([Nasty.AST.Semantic.Mention.t()]) :: String.t()

Selects the representative mention for a cluster.

Uses the following priority:

  1. First proper name (most specific)
  2. First definite NP (next most specific)
  3. First mention (fallback)

Examples

iex> cluster = [pronoun_mention, name_mention, np_mention]
iex> Clusterer.select_representative(cluster)
"John"  # The proper name