ExNlp.Cooccurrence (ex_nlp v0.1.0)
View SourceTerm co-occurrence analysis.
This module provides functions for analyzing how terms co-occur in documents, useful for building related term networks and understanding term associations.
Examples
# Build co-occurrence matrix
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"], ["bird"]]
iex> matrix = ExNlp.Cooccurrence.cooccurrence_matrix(corpus)
iex> matrix["cat"]["dog"]
2
# Find co-occurring terms
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"]]
iex> ExNlp.Cooccurrence.cooccurring_terms("cat", corpus, 2)
[{"dog", 2}, {"bird", 1}]
Summary
Functions
Builds a co-occurrence matrix from a corpus.
Finds terms that co-occur with a given term, sorted by frequency.
Calculates mutual information (MI) score for a term pair.
Calculates co-occurrence within a sliding window.
Types
Functions
@spec cooccurrence_matrix(corpus()) :: %{ required(String.t()) => %{required(String.t()) => non_neg_integer()} }
Builds a co-occurrence matrix from a corpus.
Returns a nested map where matrix[word1][word2] is the number of times
word1 and word2 co-occur in the same document.
Examples
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"]]
iex> matrix = ExNlp.Cooccurrence.cooccurrence_matrix(corpus)
iex> matrix["cat"]["dog"]
2
iex> matrix["dog"]["cat"]
2
@spec cooccurring_terms(String.t(), corpus(), non_neg_integer()) :: [ {String.t(), non_neg_integer()} ]
Finds terms that co-occur with a given term, sorted by frequency.
Returns a list of {term, count} tuples where count is the number of
documents where both terms appear.
Examples
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"], ["bird"]]
iex> ExNlp.Cooccurrence.cooccurring_terms("cat", corpus, 10)
[{"dog", 2}, {"bird", 1}]
Calculates mutual information (MI) score for a term pair.
Mutual information measures how much information about one term is provided by knowing the other term. Higher values indicate stronger association.
Examples
iex> corpus = [["cat", "dog"], ["cat", "dog"], ["cat"], ["dog"]]
iex> mi = ExNlp.Cooccurrence.mutual_information("cat", "dog", corpus)
iex> mi > 0.0 and mi < 1.0
true
@spec window_cooccurrence( corpus(), keyword() ) :: %{required(String.t()) => %{required(String.t()) => non_neg_integer()}}
Calculates co-occurrence within a sliding window.
Returns a map of term pairs to their co-occurrence count within the window.
Options
:window_size- Size of the sliding window in tokens (default: 5)
Examples
iex> corpus = [["the", "quick", "brown", "fox", "jumps"]]
iex> matrix = ExNlp.Cooccurrence.window_cooccurrence(corpus, window_size: 2)
iex> matrix["quick"]["brown"]
1