Nasty.Semantic.Coreference.Clusterer (Nasty v0.3.0)
View SourceGeneric clustering module for coreference resolution.
Builds coreference chains from mentions using agglomerative clustering:
- Start with each mention in its own cluster
- Iteratively merge the best-scoring cluster pair
- Continue until no pairs score above threshold
Supports different merge strategies (average, best, worst linkage).
Summary
Functions
Builds coreference chains from mentions.
Finds the best pair of clusters to merge.
Merges clusters iteratively until no more merges are possible.
Selects the representative mention for a cluster.
Functions
@spec build_chains( [Nasty.AST.Semantic.Mention.t()], keyword() ) :: [Nasty.AST.Semantic.CorefChain.t()]
Builds coreference chains from mentions.
Uses agglomerative clustering to group mentions that likely refer to the same entity.
Parameters
mentions- List of all mentions from documentopts- Clustering options:min_score- Minimum score threshold for merging (default: 0.3):max_distance- Maximum sentence distance (default: 3):merge_strategy- Linkage type (default: :average):weights- Custom scoring weights
Returns
List of CorefChain structs, each containing mentions referring to same entity. Chains with only 1 mention are filtered out.
Examples
iex> mentions = [m1, m2, m3, m4]
iex> chains = Clusterer.build_chains(mentions, min_score: 0.3)
[
%CorefChain{mentions: [m1, m2], representative: "John"},
%CorefChain{mentions: [m3, m4], representative: "the cat"}
]
@spec find_best_merge([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) :: {:ok, {non_neg_integer(), non_neg_integer()}} | :none
Finds the best pair of clusters to merge.
Scores all cluster pairs and returns indices of the pair with highest score above the minimum threshold.
Returns {:ok, {idx1, idx2}} or :none if no valid merge exists.
@spec merge_clusters([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) :: [ [Nasty.AST.Semantic.Mention.t()] ]
Merges clusters iteratively until no more merges are possible.
Finds the best-scoring cluster pair at each iteration and merges them. Stops when no pair scores above min_score threshold.
@spec select_representative([Nasty.AST.Semantic.Mention.t()]) :: String.t()
Selects the representative mention for a cluster.
Uses the following priority:
- First proper name (most specific)
- First definite NP (next most specific)
- First mention (fallback)
Examples
iex> cluster = [pronoun_mention, name_mention, np_mention]
iex> Clusterer.select_representative(cluster)
"John" # The proper name