Nasty.Statistics.Parsing.CYKParser (Nasty v0.3.0)
View SourceCYK (Cocke-Younger-Kasami) parsing algorithm for PCFGs.
Implements bottom-up chart parsing with dynamic programming to find the highest probability parse tree for a given sentence.
Algorithm
The CYK algorithm uses a chart (2D table) where chart[i][j] stores
all possible parse trees for the span from word i to word j.
- Initialize bottom row with lexical rules (words)
- For each span length (2 to n):
- For each possible split point k:
- Try combining trees from [i,k] and [k+1,j]
- If a binary rule A → B C exists, create new tree for A
- For each possible split point k:
- Return highest probability tree spanning entire sentence
Complexity
Time: O(n³ × |G|) where n = sentence length, |G| = grammar size Space: O(n² × |G|)
Examples
iex> grammar = PCFG.new(...)
iex> tokens = [%Token{text: "the"}, %Token{text: "cat"}]
iex> {:ok, parse_tree} = CYKParser.parse(grammar, tokens)
Summary
Functions
Builds the CYK parsing chart using dynamic programming.
Extracts bracket pairs from a parse tree for evaluation.
Extracts all possible parse trees from the chart.
Computes log probability of a parse tree.
Parses a sequence of tokens using the CYK algorithm.
Converts a parse tree to a bracketed string representation.
Types
@type chart() :: %{ required({non_neg_integer(), non_neg_integer()}) => [chart_entry()] }
@type chart_entry() :: {atom(), parse_tree()}
@type parse_tree() :: %{ label: atom(), probability: float(), children: [parse_tree() | Nasty.AST.Token.t()], span: {non_neg_integer(), non_neg_integer()}, rule: Nasty.Statistics.Parsing.Grammar.Rule.t() | nil }
Functions
@spec build_chart(map(), [Nasty.AST.Token.t()], non_neg_integer()) :: chart()
Builds the CYK parsing chart using dynamic programming.
Implementation
- Fill diagonal with lexical rules (single words)
- For increasing span lengths:
- Try all split points
- Apply binary rules to combine smaller spans
- Keep only highest probability parses (beam search)
@spec extract_brackets(parse_tree()) :: [ {atom(), non_neg_integer(), non_neg_integer()} ]
Extracts bracket pairs from a parse tree for evaluation.
Returns a list of {label, start_pos, end_pos} tuples representing
all constituents in the tree.
Examples
iex> tree = parse_tree_for("the cat sat")
iex> CYKParser.extract_brackets(tree)
[{:s, 0, 2}, {:np, 0, 1}, {:vp, 2, 2}, ...]
@spec get_n_best_parses( chart(), atom(), non_neg_integer(), non_neg_integer(), pos_integer() ) :: [ parse_tree() ]
Extracts all possible parse trees from the chart.
Returns all trees (not just the best one) for the given span and label. Useful for n-best parsing.
Parameters
chart- Completed CYK chartlabel- Non-terminal to extracti- Start positionj- End positionn- Number of best parses to return (default: 1)
@spec log_probability(parse_tree()) :: float()
Computes log probability of a parse tree.
Sum of log probabilities of all rules used in the tree. More numerically stable than multiplying probabilities.
Examples
iex> tree = %{probability: 0.001, children: [...]}
iex> CYKParser.log_probability(tree)
-6.907755278982137 # log(0.001)
@spec parse(map(), [Nasty.AST.Token.t()], keyword()) :: {:ok, parse_tree()} | {:error, term()}
Parses a sequence of tokens using the CYK algorithm.
Returns the highest probability parse tree, or an error if no parse exists.
Parameters
grammar- PCFG model with rules in CNFtokens- List of tokens to parseopts- Options::start_symbol- Root non-terminal (default::s):beam_width- Max entries per chart cell (default: 10, 0 = unlimited)
Returns
{:ok, parse_tree}- Successful parse{:error, reason}- No valid parse found
@spec to_brackets(parse_tree()) :: String.t()
Converts a parse tree to a bracketed string representation.
Useful for evaluation and debugging.
Examples
iex> tree = %{label: :np, children: [...]}
iex> CYKParser.to_brackets(tree)
"(NP (DET the) (NOUN cat))"