mix nasty.quantize (Nasty v0.3.0)

View Source

Quantize neural models for faster inference and smaller file size.

Usage

mix nasty.quantize \
  --model models/pos_tagger.axon \
  --calibration data/calibration.conllu \
  --output models/pos_tagger_int8.axon

Options

  • --model - Path to model to quantize (required)
  • --calibration - Path to calibration data (required for INT8)
  • --output - Output path for quantized model (required)
  • --method - Quantization method: int8, dynamic, qat (default: int8)
  • --calibration-method - Calibration method: minmax, percentile, entropy (default: percentile)
  • --percentile - Percentile for calibration (default: 99.99)
  • --symmetric - Use symmetric quantization (default: true)
  • --per-channel - Per-channel quantization (default: true)
  • --target-accuracy-loss - Max acceptable accuracy loss (default: 0.01)
  • --calibration-limit - Max calibration samples (default: 500)

Examples

# Quick INT8 quantization
mix nasty.quantize \
  --model models/pos_tagger.axon \
  --calibration data/dev.conllu \
  --output models/pos_tagger_int8.axon

# Production quantization with validation
mix nasty.quantize \
  --model models/pos_tagger.axon \
  --calibration data/calibration.conllu \
  --output models/pos_tagger_int8.axon \
  --method int8 \
  --calibration-method percentile \
  --percentile 99.99 \
  --target-accuracy-loss 0.01

# Dynamic quantization (no calibration needed)
mix nasty.quantize \
  --model models/pos_tagger.axon \
  --output models/pos_tagger_dynamic.axon \
  --method dynamic