mix nasty.train.pos (Nasty v0.3.0)

View Source

Trains a Hidden Markov Model for part-of-speech tagging.

Usage

mix nasty.train.pos --corpus TRAIN_FILE [options]

Options

--corpus PATH       Path to CoNLL-U training file (required)
--dev PATH          Path to development/validation file (optional)
--test PATH         Path to test file (optional)
--output PATH       Output path for trained model (default: priv/models/en/pos_hmm_v1.model)
--smoothing FLOAT   Smoothing constant for unknown words (default: 0.001)
--quiet             Suppress progress output

Examples

# Basic training
mix nasty.train.pos --corpus data/UD_English-EWT/en_ewt-ud-train.conllu

# Training with evaluation
mix nasty.train.pos \
  --corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
  --dev data/UD_English-EWT/en_ewt-ud-dev.conllu \
  --test data/UD_English-EWT/en_ewt-ud-test.conllu

# Custom output location and hyperparameters
mix nasty.train.pos \
  --corpus train.conllu \
  --output my_model.model \
  --smoothing 0.0005

Output

The task creates two files:

  • {output_path} - The trained model (binary format)
  • {output_path}.meta.json - Model metadata (JSON format)

The metadata file includes:

  • Model type, version, and training parameters
  • Training corpus information
  • Evaluation metrics (accuracy, F1 score)
  • Vocabulary and tag statistics