Nasty.Semantic.Coreference.Clusterer (Nasty v0.3.0)

Generic clustering module for coreference resolution.

Builds coreference chains from mentions using agglomerative clustering:

Start with each mention in its own cluster
Iteratively merge the best-scoring cluster pair
Continue until no pairs score above threshold

Supports different merge strategies (average, best, worst linkage).

Summary

Functions

build_chains(mentions, opts \\ [])

Builds coreference chains from mentions.

find_best_merge(clusters, opts, min_score)

Finds the best pair of clusters to merge.

merge_clusters(clusters, opts, min_score)

Merges clusters iteratively until no more merges are possible.

select_representative(cluster)

Selects the representative mention for a cluster.

Functions

build_chains(mentions, opts \\ [])

@spec build_chains(
  [Nasty.AST.Semantic.Mention.t()],
  keyword()
) :: [Nasty.AST.Semantic.CorefChain.t()]

Builds coreference chains from mentions.

Uses agglomerative clustering to group mentions that likely refer to the same entity.

Parameters

mentions - List of all mentions from document
opts - Clustering options
- :min_score - Minimum score threshold for merging (default: 0.3)
- :max_distance - Maximum sentence distance (default: 3)
- :merge_strategy - Linkage type (default: :average)
- :weights - Custom scoring weights

Returns

List of CorefChain structs, each containing mentions referring to same entity. Chains with only 1 mention are filtered out.

Examples

iex> mentions = [m1, m2, m3, m4]
iex> chains = Clusterer.build_chains(mentions, min_score: 0.3)
[
  %CorefChain{mentions: [m1, m2], representative: "John"},
  %CorefChain{mentions: [m3, m4], representative: "the cat"}
]

find_best_merge(clusters, opts, min_score)

@spec find_best_merge([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) ::
  {:ok, {non_neg_integer(), non_neg_integer()}} | :none

Finds the best pair of clusters to merge.

Scores all cluster pairs and returns indices of the pair with highest score above the minimum threshold.

Returns {:ok, {idx1, idx2}} or :none if no valid merge exists.

merge_clusters(clusters, opts, min_score)

@spec merge_clusters([[Nasty.AST.Semantic.Mention.t()]], keyword(), float()) :: [
  [Nasty.AST.Semantic.Mention.t()]
]

Merges clusters iteratively until no more merges are possible.

Finds the best-scoring cluster pair at each iteration and merges them. Stops when no pair scores above min_score threshold.

select_representative(cluster)

@spec select_representative([Nasty.AST.Semantic.Mention.t()]) :: String.t()

Selects the representative mention for a cluster.

Uses the following priority:

First proper name (most specific)
First definite NP (next most specific)
First mention (fallback)

Examples

iex> cluster = [pronoun_mention, name_mention, np_mention]
iex> Clusterer.select_representative(cluster)
"John"  # The proper name