Nasty.Language.English.SentenceParser (Nasty v0.3.0)

View Source

Sentence and clause parser for English.

Builds Clause and Sentence structures from phrases.

Approaches

  • Rule-based parsing (default): Subject (NP) + Predicate (VP)
  • PCFG parsing: Statistical phrase structure parsing

Examples

# Rule-based (default)
iex> tokens = [...]  # "The cat sat."
iex> SentenceParser.parse_sentences(tokens)
{:ok, [sentence]}

# PCFG-based
iex> SentenceParser.parse_sentences(tokens, model: :pcfg)
{:ok, [sentence]}

Summary

Functions

Parses a clause from tokens, detecting coordination and subordination.

Parses a single sentence from tokens.

Parses tokens into a list of sentences.

PCFG-based sentence parsing using statistical phrase structure grammar.

Rule-based sentence parsing (original implementation).

Functions

parse_clause(tokens)

@spec parse_clause([Nasty.AST.Token.t()]) ::
  {:ok, Nasty.AST.Clause.t() | [Nasty.AST.Clause.t()]} | :error

Parses a clause from tokens, detecting coordination and subordination.

Grammar: Simple: (NP) VP Coordinated: Clause CoordConj Clause Subordinate: SubordConj Clause

parse_sentence(tokens)

@spec parse_sentence([Nasty.AST.Token.t()]) :: Nasty.AST.Sentence.t() | nil

Parses a single sentence from tokens.

Grammar: NP VP (simplified for Phase 3)

parse_sentences(tokens, opts \\ [])

@spec parse_sentences(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Sentence.t()]} | {:error, term()}

Parses tokens into a list of sentences.

Identifies sentence boundaries and parses each sentence separately.

Options

  • :model - Model type: :rule_based (default) or :pcfg
  • :pcfg_model - Trained PCFG model (optional, will load from registry if not provided)

Returns

  • {:ok, sentences} - List of parsed sentences
  • {:error, reason} - Parsing failed

parse_sentences_pcfg(tokens, opts)

@spec parse_sentences_pcfg(
  [Nasty.AST.Token.t()],
  keyword()
) :: {:ok, [Nasty.AST.Sentence.t()]} | {:error, term()}

PCFG-based sentence parsing using statistical phrase structure grammar.

If no model is provided via :pcfg_model option, attempts to load the latest PCFG model from the registry. Falls back to rule-based parsing if no model is available.

parse_sentences_rule_based(tokens)

@spec parse_sentences_rule_based([Nasty.AST.Token.t()]) ::
  {:ok, [Nasty.AST.Sentence.t()]} | {:error, term()}

Rule-based sentence parsing (original implementation).