ExNlp.Cooccurrence (ex_nlp v0.1.0)

View Source

Term co-occurrence analysis.

This module provides functions for analyzing how terms co-occur in documents, useful for building related term networks and understanding term associations.

Examples

# Build co-occurrence matrix
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"], ["bird"]]
iex> matrix = ExNlp.Cooccurrence.cooccurrence_matrix(corpus)
iex> matrix["cat"]["dog"]
2

# Find co-occurring terms
iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"]]
iex> ExNlp.Cooccurrence.cooccurring_terms("cat", corpus, 2)
[{"dog", 2}, {"bird", 1}]

Summary

Types

A corpus is a list of documents

A document is a list of tokens

Functions

Builds a co-occurrence matrix from a corpus.

Finds terms that co-occur with a given term, sorted by frequency.

Calculates mutual information (MI) score for a term pair.

Calculates co-occurrence within a sliding window.

Types

corpus()

@type corpus() :: [document()]

A corpus is a list of documents

document()

@type document() :: [String.t()]

A document is a list of tokens

Functions

cooccurrence_matrix(corpus)

@spec cooccurrence_matrix(corpus()) :: %{
  required(String.t()) => %{required(String.t()) => non_neg_integer()}
}

Builds a co-occurrence matrix from a corpus.

Returns a nested map where matrix[word1][word2] is the number of times word1 and word2 co-occur in the same document.

Examples

iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"]]
iex> matrix = ExNlp.Cooccurrence.cooccurrence_matrix(corpus)
iex> matrix["cat"]["dog"]
2
iex> matrix["dog"]["cat"]
2

cooccurring_terms(term, corpus, limit \\ 10)

@spec cooccurring_terms(String.t(), corpus(), non_neg_integer()) :: [
  {String.t(), non_neg_integer()}
]

Finds terms that co-occur with a given term, sorted by frequency.

Returns a list of {term, count} tuples where count is the number of documents where both terms appear.

Examples

iex> corpus = [["cat", "dog"], ["cat", "bird", "dog"], ["bird"]]
iex> ExNlp.Cooccurrence.cooccurring_terms("cat", corpus, 10)
[{"dog", 2}, {"bird", 1}]

mutual_information(term1, term2, corpus)

@spec mutual_information(String.t(), String.t(), corpus()) :: float()

Calculates mutual information (MI) score for a term pair.

Mutual information measures how much information about one term is provided by knowing the other term. Higher values indicate stronger association.

Examples

iex> corpus = [["cat", "dog"], ["cat", "dog"], ["cat"], ["dog"]]
iex> mi = ExNlp.Cooccurrence.mutual_information("cat", "dog", corpus)
iex> mi > 0.0 and mi < 1.0
true

window_cooccurrence(corpus, opts \\ [])

@spec window_cooccurrence(
  corpus(),
  keyword()
) :: %{required(String.t()) => %{required(String.t()) => non_neg_integer()}}

Calculates co-occurrence within a sliding window.

Returns a map of term pairs to their co-occurrence count within the window.

Options

  • :window_size - Size of the sliding window in tokens (default: 5)

Examples

iex> corpus = [["the", "quick", "brown", "fox", "jumps"]]
iex> matrix = ExNlp.Cooccurrence.window_cooccurrence(corpus, window_size: 2)
iex> matrix["quick"]["brown"]
1