mix nasty.train.pos (Nasty v0.3.0)
View SourceTrains a Hidden Markov Model for part-of-speech tagging.
Usage
mix nasty.train.pos --corpus TRAIN_FILE [options]Options
--corpus PATH Path to CoNLL-U training file (required)
--dev PATH Path to development/validation file (optional)
--test PATH Path to test file (optional)
--output PATH Output path for trained model (default: priv/models/en/pos_hmm_v1.model)
--smoothing FLOAT Smoothing constant for unknown words (default: 0.001)
--quiet Suppress progress outputExamples
# Basic training
mix nasty.train.pos --corpus data/UD_English-EWT/en_ewt-ud-train.conllu
# Training with evaluation
mix nasty.train.pos \
--corpus data/UD_English-EWT/en_ewt-ud-train.conllu \
--dev data/UD_English-EWT/en_ewt-ud-dev.conllu \
--test data/UD_English-EWT/en_ewt-ud-test.conllu
# Custom output location and hyperparameters
mix nasty.train.pos \
--corpus train.conllu \
--output my_model.model \
--smoothing 0.0005Output
The task creates two files:
{output_path}- The trained model (binary format){output_path}.meta.json- Model metadata (JSON format)
The metadata file includes:
- Model type, version, and training parameters
- Training corpus information
- Evaluation metrics (accuracy, F1 score)
- Vocabulary and tag statistics