DoRA: Weight-Decomposed Low-Rank Adaptation.
Implements DoRA from "DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models" (Liu et al., 2024). DoRA decomposes pretrained weights into magnitude and direction components, then applies LoRA only to the direction.
Key Innovation: Magnitude-Direction Decomposition
Standard LoRA modifies the full weight: W' = W + BA
DoRA decomposes W into magnitude m and direction V:
W = m * (V / ||V||)Then applies LoRA only to the direction component:
W' = m * ((V + BA) / ||V + BA||)Where:
mis a learnable magnitude vector [output_size]Vis the original weight directionBAis the standard LoRA low-rank update||.||is column-wise L2 normalization
Why This Works
Separating magnitude from direction gives two benefits:
- Direction captures "what" features are important (adapted by LoRA)
- Magnitude captures "how much" each feature matters (learned separately)
- This mirrors weight normalization, which is known to improve optimization
Architecture
Input x [batch, input_size]
|
+---> W * x (frozen base)
| |
+---> A * x -> B * (A * x) (LoRA delta)
| |
| V + BA (direction update)
| |
| normalize(V + BA) (unit direction)
| |
| m * normalized (apply magnitude)
|
v
Output [batch, output_size]LoRA+ Note
LoRA+ (Hayou et al., 2024) proposes different learning rates for A vs B matrices. This is a training configuration choice rather than architectural: use a higher learning rate for B (e.g., 5-10x) than for A. We document this recommendation but don't enforce it in the graph structure.
Usage
# Standalone DoRA layer
dora = DoRA.build(input_size: 768, output_size: 768, rank: 8)
# Wrap an existing layer with DoRA
adapted = DoRA.wrap(input, original, rank: 8, name: "dora_attn")References
- Liu et al., "DoRA: Weight-Decomposed Low-Rank Adaptation" (2024)
- https://arxiv.org/abs/2402.09353
- Hayou et al., "LoRA+: Efficient Low Rank Adaptation of Large Models" (2024)
Summary
Functions
Build a standalone DoRA adapter layer.
Build a DoRA layer inline (for use in custom architectures).
Get the output size of a DoRA layer.
Get recommended defaults.
Wrap an existing dense layer with DoRA adaptation.
Types
@type build_opt() :: {:alpha, float()} | {:input_size, pos_integer()} | {:output_size, pos_integer()} | {:rank, pos_integer()}
Options for build/1.
Functions
Build a standalone DoRA adapter layer.
Computes weight-decomposed adaptation: m * normalize(V*x + (alpha/rank)*B(A(x))).
Options
:input_size- Input dimension (required):output_size- Output dimension (required):rank- Low-rank dimension (default: 8):alpha- LoRA scaling factor (default: 16.0):name- Layer name prefix (default: "dora")
Returns
An Axon model: [batch, input_size] -> [batch, output_size]
@spec dora_layer(Axon.t(), pos_integer(), pos_integer(), keyword()) :: Axon.t()
Build a DoRA layer inline (for use in custom architectures).
Parameters
input- Axon input nodeinput_size- Input dimensionoutput_size- Output dimension
Options
:rank- Low-rank dimension (default: 8):alpha- LoRA scaling factor (default: 16.0):name- Layer name prefix (default: "dora")
@spec output_size(keyword()) :: pos_integer()
Get the output size of a DoRA layer.
@spec recommended_defaults() :: keyword()
Get recommended defaults.
Wrap an existing dense layer with DoRA adaptation.
Parameters
input- The Axon input nodeoriginal- The original Axon dense layer output
Options
:output_size- Output dimension (required):rank- Low-rank dimension (default: 8):alpha- Scaling factor (default: 16.0):name- Layer name prefix (default: "dora")