Nasty.Statistics.Parsing.Grammar (Nasty v0.3.0)

View Source

Grammar rule representation and manipulation for PCFGs.

Provides data structures and operations for working with probabilistic context-free grammar rules.

Grammar Rules

A rule represents a production: A → α where:

  • A is a non-terminal (left-hand side)
  • α is a sequence of terminals and/or non-terminals (right-hand side)
  • Each rule has an associated probability

Examples

iex> rule = Grammar.Rule.new(:np, [:det, :noun], 0.35)
%Grammar.Rule{lhs: :np, rhs: [:det, :noun], probability: 0.35}

iex> Grammar.lexical_rule?(rule)
false  # Not a lexical rule (no terminal symbols)

Summary

Functions

Applies add-k smoothing to grammar rules.

Checks if a rule is binary (A → B C).

Builds an index of rules by their left-hand side for fast lookup.

Checks if a rule is lexical (produces a terminal/word).

Extracts all non-terminals used in the grammar.

Normalizes rule probabilities for rules with the same LHS.

Extracts all terminals (words) from lexical rules.

Converts the grammar to Chomsky Normal Form (CNF).

Checks if a rule is unary (A → B).

Functions

apply_smoothing(rules, k \\ 0.001)

Applies add-k smoothing to grammar rules.

Adds a small constant k to each rule count before normalization to handle unseen productions.

Examples

iex> rules = [Rule.new(:np, [:det, :noun], 0.7), Rule.new(:np, [:pron], 0.3)]
iex> smoothed = Grammar.apply_smoothing(rules, 0.001)
iex> Enum.all?(smoothed, fn r -> r.probability > 0 end)
true

binary_rule?(arg1)

Checks if a rule is binary (A → B C).

Examples

iex> Grammar.binary_rule?(Rule.new(:np, [:det, :noun], 0.35))
true

iex> Grammar.binary_rule?(Rule.new(:np, [:pron], 0.25))
false

index_by_lhs(rules)

@spec index_by_lhs([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: %{
  required(atom()) => [Nasty.Statistics.Parsing.Grammar.Rule.t()]
}

Builds an index of rules by their left-hand side for fast lookup.

Examples

iex> rules = [
...>   Rule.new(:np, [:det, :noun], 0.35),
...>   Rule.new(:np, [:pron], 0.25),
...>   Rule.new(:vp, [:verb, :np], 0.45)
...> ]
iex> index = Grammar.index_by_lhs(rules)
iex> length(index[:np])
2

lexical_rule?(arg1)

@spec lexical_rule?(Nasty.Statistics.Parsing.Grammar.Rule.t()) :: boolean()

Checks if a rule is lexical (produces a terminal/word).

A rule is lexical if its RHS contains exactly one element that is a string.

Examples

iex> Grammar.lexical_rule?(Rule.new(:noun, ["cat"], 0.01))
true

iex> Grammar.lexical_rule?(Rule.new(:np, [:det, :noun], 0.35))
false

non_terminals(rules)

@spec non_terminals([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: MapSet.t()

Extracts all non-terminals used in the grammar.

Examples

iex> rules = [
...>   Rule.new(:np, [:det, :noun], 0.35),
...>   Rule.new(:vp, [:verb, :np], 0.45)
...> ]
iex> Grammar.non_terminals(rules)
MapSet.new([:np, :vp, :det, :noun, :verb])

normalize_probabilities(rules)

Normalizes rule probabilities for rules with the same LHS.

Ensures that P(A → α) for all rules with LHS = A sums to 1.0.

Examples

iex> rules = [
...>   Rule.new(:np, [:det, :noun], 35),  # Raw counts
...>   Rule.new(:np, [:pron], 25),
...>   Rule.new(:np, [:propn], 40)
...> ]
iex> normalized = Grammar.normalize_probabilities(rules)
iex> Enum.reduce(normalized, 0.0, fn r, acc -> acc + r.probability end)
1.0

terminals(rules)

Extracts all terminals (words) from lexical rules.

Examples

iex> rules = [
...>   Rule.new(:det, ["the"], 0.5),
...>   Rule.new(:noun, ["cat"], 0.3)
...> ]
iex> Grammar.terminals(rules)
MapSet.new(["the", "cat"])

to_cnf(rules)

Converts the grammar to Chomsky Normal Form (CNF).

CNF requires:

  • All rules are either binary (A → B C) or lexical (A → word)
  • No unary rules (except to terminals)
  • No epsilon productions

This is required for CYK parsing.

Implementation

  1. Eliminate epsilon productions
  2. Eliminate unary rules (A → B) by substitution
  3. Convert long rules (A → B C D) into binary: A → B X, X → C D
  4. Convert mixed terminal/non-terminal rules

Examples

iex> rules = [Rule.new(:s, [:np, :vp, :pp], 0.8)]
iex> cnf_rules = Grammar.to_cnf(rules)
iex> Enum.all?(cnf_rules, fn r -> binary_rule?(r) or lexical_rule?(r) end)
true

unary_rule?(arg1)

Checks if a rule is unary (A → B).

Examples

iex> Grammar.unary_rule?(Rule.new(:np, [:pron], 0.25))
true

iex> Grammar.unary_rule?(Rule.new(:np, [:det, :noun], 0.35))
false