Nasty.Statistics.Neural.Quantization.INT8 (Nasty v0.3.0)
View SourceINT8 post-training quantization for neural models.
Converts Float32 model weights to INT8 representation for:
- 4x smaller model files
- 2-3x faster inference on CPU
- 40-60% lower memory usage
- <1% accuracy degradation (with proper calibration)
Process
- Calibration: Run representative data through model to collect activation statistics
- Quantization: Convert Float32 weights to INT8 using calibration data
- Validation: Verify accuracy degradation is within acceptable bounds
Example
alias Nasty.Statistics.Neural.Quantization.INT8
# Load a trained model
{:ok, model} = NeuralTagger.load("pos_tagger.axon")
# Quantize with calibration data
{:ok, quantized} = INT8.quantize(model,
calibration_data: calibration_samples,
target_accuracy_loss: 0.01 # Max 1% accuracy loss
)
# Save quantized model
INT8.save(quantized, "pos_tagger_int8.axon")
Summary
Functions
Estimates size reduction from quantization.
Loads a quantized model from disk.
Quantizes a model to INT8 precision using post-training quantization.
Saves a quantized model to disk.
Types
Functions
Estimates size reduction from quantization.
Examples
INT8.estimate_size_reduction(model)
# => %{original_mb: 400, quantized_mb: 100, reduction: 4.0}
Loads a quantized model from disk.
Examples
{:ok, model} = INT8.load("model_int8.axon")
Quantizes a model to INT8 precision using post-training quantization.
Parameters
model- Trained model to quantizeopts- Quantization options
Options
:calibration_data- Representative data for calibration (required):calibration_method- Method for determining quantization ranges:minmax- Use min/max values (default):percentile- Use percentile ranges (more robust to outliers):entropy- Minimize KL divergence
:per_channel- Quantize per-channel vs per-tensor (default: true):symmetric- Use symmetric quantization (default: true):target_accuracy_loss- Max acceptable accuracy loss (default: 0.01)
Returns
{:ok, quantized_model}- Successfully quantized model{:error, reason}- Quantization failed
Saves a quantized model to disk.
Examples
INT8.save(quantized_model, "model_int8.axon")