mix nasty.quantize (Nasty v0.3.0)
View SourceQuantize neural models for faster inference and smaller file size.
Usage
mix nasty.quantize \
--model models/pos_tagger.axon \
--calibration data/calibration.conllu \
--output models/pos_tagger_int8.axonOptions
--model- Path to model to quantize (required)--calibration- Path to calibration data (required for INT8)--output- Output path for quantized model (required)--method- Quantization method: int8, dynamic, qat (default: int8)--calibration-method- Calibration method: minmax, percentile, entropy (default: percentile)--percentile- Percentile for calibration (default: 99.99)--symmetric- Use symmetric quantization (default: true)--per-channel- Per-channel quantization (default: true)--target-accuracy-loss- Max acceptable accuracy loss (default: 0.01)--calibration-limit- Max calibration samples (default: 500)
Examples
# Quick INT8 quantization
mix nasty.quantize \
--model models/pos_tagger.axon \
--calibration data/dev.conllu \
--output models/pos_tagger_int8.axon
# Production quantization with validation
mix nasty.quantize \
--model models/pos_tagger.axon \
--calibration data/calibration.conllu \
--output models/pos_tagger_int8.axon \
--method int8 \
--calibration-method percentile \
--percentile 99.99 \
--target-accuracy-loss 0.01
# Dynamic quantization (no calibration needed)
mix nasty.quantize \
--model models/pos_tagger.axon \
--output models/pos_tagger_dynamic.axon \
--method dynamic