Edifice.Feedforward.BitNet (Edifice v0.2.0)

BitNet: 1-bit/1.58-bit transformer with ternary weight quantization.

Implements the BitNet architecture from "BitNet: Scaling 1-Bit Transformers for Large Language Models" (Wang et al., 2023) and "The Era of 1-bit LLMs" (Ma et al., 2024). BitNet quantizes weights to {-1, 0, +1} in the forward pass while maintaining full-precision weights for gradient updates.

Key Innovation: Quantization-Aware Forward Pass

BitNet uses "BitLinear" layers that replace standard dense layers:

Full-precision weights are stored for training
In the forward pass, weights are quantized to binary ({-1, +1}) or ternary ({-1, 0, +1}) values
Activations are quantized to 8-bit using absmax quantization
Gradients flow through the quantization via straight-through estimator

BitLinear(x):
  W_quant = quantize_weights(W)   # Binary: sign(W), Ternary: round(W/mean(|W|))
  x_quant = quantize_activations(x)  # absmax to [-128, 127]
  output = x_quant @ W_quant^T * scale

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-----------------------+
| Input Projection      |
+-----------------------+
      |
      v
+-----------------------+
| BitNet Block x N      |
|  Norm -> BitAttn      |
|  Norm -> BitFFN       |
|  (all dense layers    |
|   use BitLinear)      |
+-----------------------+
      |
      v
[batch, hidden_size]    (last timestep)

Quantization Modes

Mode	Weight Values	Bits per Weight
Binary	{-1, +1}	1 bit
Ternary	{-1, 0, +1}	1.58 bits

Usage

model = BitNet.build(
  embed_dim: 287,
  hidden_size: 256,
  num_heads: 4,
  num_layers: 4,
  quantize: :ternary
)

References

"BitNet: Scaling 1-Bit Transformers for Large Language Models"
"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"
arXiv: https://arxiv.org/abs/2310.11453

Summary

Types

build_opt()

Options for build/1.

Functions

bitlinear(input, output_size, opts \\ [])

Build a BitLinear layer: dense with quantized weights in forward pass.

build(opts \\ [])

Build a BitNet model for sequence processing.

build_bitnet_block(input, opts)

Build a single BitNet transformer block with quantized linear layers.

output_size(opts \\ [])

Get the output size of a BitNet model.

recommended_defaults()

Get recommended defaults.