Adaptive Layer Normalization (AdaLN / AdaLN-Zero).
Conditional normalization where scale and shift parameters are predicted from a conditioning signal (e.g., timestep embedding, class label). Used in Diffusion Transformers (DiT) and class-conditional generation.
Variants
- AdaLN: Replace fixed gamma/beta with condition-predicted parameters
- AdaLN-Zero: Also predict a gating factor alpha, initialized to zero
Formula
AdaLN(x, c) = gamma(c) * LayerNorm(x) + beta(c)
AdaLN-Zero(x, c) = alpha(c) * (gamma(c) * LayerNorm(x) + beta(c))Usage
# AdaLN conditioning on timestep embedding
output = AdaptiveNorm.layer(input, condition,
hidden_size: 256,
mode: :adaln_zero
)References
- "Scalable Diffusion Models with Transformers" (Peebles & Xie, 2023)
- https://arxiv.org/abs/2212.09748
Summary
Functions
Build an AdaLN / AdaLN-Zero layer.
Functions
Build an AdaLN / AdaLN-Zero layer.
Parameters
input- Input tensor Axon node [batch, ..., hidden_size]condition- Conditioning signal Axon node [batch, cond_dim]
Options
:hidden_size- Feature dimension (required):mode- :adaln or :adaln_zero (default: :adaln_zero):name- Layer name prefix (default: "adaptive_norm")