Edifice.Blocks.RMSNorm (Edifice v0.2.0)

Copy Markdown View Source

Root Mean Square Layer Normalization.

Simpler and faster than standard LayerNorm -- normalizes by the RMS of the activations without centering (no mean subtraction). Used by LLaMA, Mamba-2, Mistral, and most modern transformer variants.

Formula

RMSNorm(x) = x / sqrt(mean(x^2) + eps) * gamma

Compared to LayerNorm which computes both mean and variance, RMSNorm only computes the RMS, saving ~50% of the normalization compute.

Usage

# As an Axon layer
normalized = RMSNorm.layer(input, hidden_size: 256)

References

Summary

Functions

Compute RMSNorm on a raw tensor.

Build an RMSNorm Axon layer.

Functions

apply(x, gamma, opts \\ [epsilon: 1.0e-6])

Compute RMSNorm on a raw tensor.

Parameters

  • x - Input tensor [..., hidden_size]
  • gamma - Learnable scale [hidden_size]

layer(input, opts \\ [])

@spec layer(
  Axon.t(),
  keyword()
) :: Axon.t()

Build an RMSNorm Axon layer.

Options

  • :hidden_size - Feature dimension for the learnable scale (required)
  • :epsilon - Numerical stability constant (default: 1.0e-6)
  • :name - Layer name prefix (default: "rms_norm")