mix nasty.train.pcfg (Nasty v0.3.0)

View Source

Trains a PCFG (Probabilistic Context-Free Grammar) model from treebank data.

Usage

mix nasty.train.pcfg --corpus data/train.conllu --output priv/models/en/pcfg.model

Options

  • --corpus - Path to training corpus in CoNLL-U format (required)
  • --test - Path to test corpus for evaluation (optional)
  • --output - Path to save trained model (required)
  • --smoothing - Smoothing constant (default: 0.001)
  • --cnf - Convert grammar to CNF (default: true)
  • --language - Language code (default: en)

Examples

# Train basic PCFG
mix nasty.train.pcfg \
  --corpus data/en_ewt-ud-train.conllu \
  --output priv/models/en/pcfg.model

# Train with evaluation
mix nasty.train.pcfg \
  --corpus data/en_ewt-ud-train.conllu \
  --test data/en_ewt-ud-test.conllu \
  --output priv/models/en/pcfg.model \
  --smoothing 0.0001