Nasty.Statistics.Parsing.Grammar (Nasty v0.3.0)
View SourceGrammar rule representation and manipulation for PCFGs.
Provides data structures and operations for working with probabilistic context-free grammar rules.
Grammar Rules
A rule represents a production: A → α where:
Ais a non-terminal (left-hand side)αis a sequence of terminals and/or non-terminals (right-hand side)- Each rule has an associated probability
Examples
iex> rule = Grammar.Rule.new(:np, [:det, :noun], 0.35)
%Grammar.Rule{lhs: :np, rhs: [:det, :noun], probability: 0.35}
iex> Grammar.lexical_rule?(rule)
false # Not a lexical rule (no terminal symbols)
Summary
Functions
Applies add-k smoothing to grammar rules.
Checks if a rule is binary (A → B C).
Builds an index of rules by their left-hand side for fast lookup.
Checks if a rule is lexical (produces a terminal/word).
Extracts all non-terminals used in the grammar.
Normalizes rule probabilities for rules with the same LHS.
Extracts all terminals (words) from lexical rules.
Converts the grammar to Chomsky Normal Form (CNF).
Checks if a rule is unary (A → B).
Functions
@spec apply_smoothing([Nasty.Statistics.Parsing.Grammar.Rule.t()], float()) :: [ Nasty.Statistics.Parsing.Grammar.Rule.t() ]
Applies add-k smoothing to grammar rules.
Adds a small constant k to each rule count before normalization
to handle unseen productions.
Examples
iex> rules = [Rule.new(:np, [:det, :noun], 0.7), Rule.new(:np, [:pron], 0.3)]
iex> smoothed = Grammar.apply_smoothing(rules, 0.001)
iex> Enum.all?(smoothed, fn r -> r.probability > 0 end)
true
@spec binary_rule?(Nasty.Statistics.Parsing.Grammar.Rule.t()) :: boolean()
Checks if a rule is binary (A → B C).
Examples
iex> Grammar.binary_rule?(Rule.new(:np, [:det, :noun], 0.35))
true
iex> Grammar.binary_rule?(Rule.new(:np, [:pron], 0.25))
false
@spec index_by_lhs([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: %{ required(atom()) => [Nasty.Statistics.Parsing.Grammar.Rule.t()] }
Builds an index of rules by their left-hand side for fast lookup.
Examples
iex> rules = [
...> Rule.new(:np, [:det, :noun], 0.35),
...> Rule.new(:np, [:pron], 0.25),
...> Rule.new(:vp, [:verb, :np], 0.45)
...> ]
iex> index = Grammar.index_by_lhs(rules)
iex> length(index[:np])
2
@spec lexical_rule?(Nasty.Statistics.Parsing.Grammar.Rule.t()) :: boolean()
Checks if a rule is lexical (produces a terminal/word).
A rule is lexical if its RHS contains exactly one element that is a string.
Examples
iex> Grammar.lexical_rule?(Rule.new(:noun, ["cat"], 0.01))
true
iex> Grammar.lexical_rule?(Rule.new(:np, [:det, :noun], 0.35))
false
@spec non_terminals([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: MapSet.t()
Extracts all non-terminals used in the grammar.
Examples
iex> rules = [
...> Rule.new(:np, [:det, :noun], 0.35),
...> Rule.new(:vp, [:verb, :np], 0.45)
...> ]
iex> Grammar.non_terminals(rules)
MapSet.new([:np, :vp, :det, :noun, :verb])
@spec normalize_probabilities([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: [ Nasty.Statistics.Parsing.Grammar.Rule.t() ]
Normalizes rule probabilities for rules with the same LHS.
Ensures that P(A → α) for all rules with LHS = A sums to 1.0.
Examples
iex> rules = [
...> Rule.new(:np, [:det, :noun], 35), # Raw counts
...> Rule.new(:np, [:pron], 25),
...> Rule.new(:np, [:propn], 40)
...> ]
iex> normalized = Grammar.normalize_probabilities(rules)
iex> Enum.reduce(normalized, 0.0, fn r, acc -> acc + r.probability end)
1.0
@spec terminals([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: MapSet.t()
Extracts all terminals (words) from lexical rules.
Examples
iex> rules = [
...> Rule.new(:det, ["the"], 0.5),
...> Rule.new(:noun, ["cat"], 0.3)
...> ]
iex> Grammar.terminals(rules)
MapSet.new(["the", "cat"])
@spec to_cnf([Nasty.Statistics.Parsing.Grammar.Rule.t()]) :: [ Nasty.Statistics.Parsing.Grammar.Rule.t() ]
Converts the grammar to Chomsky Normal Form (CNF).
CNF requires:
- All rules are either binary (A → B C) or lexical (A → word)
- No unary rules (except to terminals)
- No epsilon productions
This is required for CYK parsing.
Implementation
- Eliminate epsilon productions
- Eliminate unary rules (A → B) by substitution
- Convert long rules (A → B C D) into binary: A → B X, X → C D
- Convert mixed terminal/non-terminal rules
Examples
iex> rules = [Rule.new(:s, [:np, :vp, :pp], 0.8)]
iex> cnf_rules = Grammar.to_cnf(rules)
iex> Enum.all?(cnf_rules, fn r -> binary_rule?(r) or lexical_rule?(r) end)
true
@spec unary_rule?(Nasty.Statistics.Parsing.Grammar.Rule.t()) :: boolean()
Checks if a rule is unary (A → B).
Examples
iex> Grammar.unary_rule?(Rule.new(:np, [:pron], 0.25))
true
iex> Grammar.unary_rule?(Rule.new(:np, [:det, :noun], 0.35))
false