View Source Axon.Quantization (Axon v0.7.0)

Model quantization.

Model quantization is a technique for reducing the memory footprint of a model by converting portions of a model to use quantized representations. Typically, these quantized representations are low-precision integers.

This is an experimental API which implements weight-only quantization. The implementation in this module will convert dense layers in a large model to quantized-variants. The only supported quantization type is {:s, 8}. Axon quantization is inference-only. Training is not currently supported.

Summary

Functions

Quantizes a model and a model state.

Replaces standard operations with quantized variants.

Returns a quantized model state.

Adds a weight-only quantized dense layer to the network.

Functions

Link to this function

quantize(model, model_state)

View Source

Quantizes a model and a model state.

Given a model and model state, this method will rewrite all of the dense layers in the model to perform weight-only 8-bit integer versions of the same operation. It will also replace values for all dense kernels in the given model state with quantized tensors.

Replaces standard operations with quantized variants.

The only supported conversion is to convert regular dense layers to a weight-only 8-bit integer variant. Note that this only replaces the properties of the model. If you have a pre-trained model state that you wish to quantize, refer to Axon.Quantization.quantize_model_state/2.

All :dense layers in the model are replaced with Axon.Quantization.weight_only_quantized_dense/3.

Link to this function

quantize_model_state(model, model_state)

View Source

Returns a quantized model state.

Given a model and a model state, this function will replace all dense layer kernels with a quantized version of the weight.

Training is not currently supported, so all quantized layers are automatically frozen.

Link to this function

weight_only_quantized_dense(x, units, opts \\ [])

View Source

Adds a weight-only quantized dense layer to the network.

This is equivalent to a dense layer, but works on quantized weights for reducing model memory footprint.

Compiles to Axon.Quantization.Layers.weight_only_quantized_dense/3.

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.