View Source Axon.Quantization (Axon v0.7.0)

Model quantization.

Model quantization is a technique for reducing the memory footprint of a model by converting portions of a model to use quantized representations. Typically, these quantized representations are low-precision integers.

This is an experimental API which implements weight-only quantization. The implementation in this module will convert dense layers in a large model to quantized-variants. The only supported quantization type is {:s, 8}. Axon quantization is inference-only. Training is not currently supported.

Summary

Functions

quantize(model, model_state)

Quantizes a model and a model state.

quantize_model(model)

Replaces standard operations with quantized variants.

quantize_model_state(model, model_state)

Returns a quantized model state.

weight_only_quantized_dense(x, units, opts \\ [])

Adds a weight-only quantized dense layer to the network.

Functions

quantize(model, model_state)

Quantizes a model and a model state.

Given a model and model state, this method will rewrite all of the dense layers in the model to perform weight-only 8-bit integer versions of the same operation. It will also replace values for all dense kernels in the given model state with quantized tensors.

quantize_model(model)

Replaces standard operations with quantized variants.

The only supported conversion is to convert regular dense layers to a weight-only 8-bit integer variant. Note that this only replaces the properties of the model. If you have a pre-trained model state that you wish to quantize, refer to Axon.Quantization.quantize_model_state/2.

All :dense layers in the model are replaced with Axon.Quantization.weight_only_quantized_dense/3.

quantize_model_state(model, model_state)

Returns a quantized model state.

Given a model and a model state, this function will replace all dense layer kernels with a quantized version of the weight.

Training is not currently supported, so all quantized layers are automatically frozen.

weight_only_quantized_dense(x, units, opts \\ [])

Adds a weight-only quantized dense layer to the network.

This is equivalent to a dense layer, but works on quantized weights for reducing model memory footprint.

Compiles to Axon.Quantization.Layers.weight_only_quantized_dense/3.

Options

:name - layer name.
:kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.
:bias_initializer - initializer for bias weights. Defaults to :zeros.
:use_bias - whether the layer should add bias to the output. Defaults to true.