Edifice.Meta.QAT (Edifice v0.2.0)

Quantization-Aware Training (QAT) — transformer with quantized linear layers.

Extends BitNet's quantization-aware training to support multiple bit widths beyond binary and ternary. All dense layers in the attention and FFN sub-layers use quantized forward passes (with straight-through gradient estimation for backpropagation).

Quantization Modes

Mode	Weight Values	Levels	Bits/Weight
`:binary`	{-1, +1}	2	1
`:ternary`	{-1, 0, +1}	3	1.58
`:int4`	16 absmax-scaled	16	4
`:int8`	256 absmax-scaled	256	8

Architecture

Input [batch, seq_len, embed_dim]
      |
Quantized blocks: Pre-norm -> QuantLinear(QKV) -> Attention -> Residual
                   Pre-norm -> QuantLinear(FFN) -> Residual
      |
Final norm -> last timestep -> [batch, hidden_size]

Usage

model = QAT.build(
  embed_dim: 256,
  hidden_size: 256,
  num_heads: 4,
  num_layers: 4,
  quantize: :int4
)

References

Wang et al., "BitNet: Scaling 1-Bit Transformers" (2023)
Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference" (2018)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a QAT model for sequence processing.

build_qat_block(input, opts)

Build a single QAT transformer block with quantized linear layers.

output_size(opts \\ [])

Get the output size of a QAT model.

quant_linear(input, output_size, opts \\ [])

Build a quantized linear layer with the given quantization mode.

recommended_defaults()

Get recommended defaults.