Edifice.Meta.DoRA (Edifice v0.2.0)

DoRA: Weight-Decomposed Low-Rank Adaptation.

Implements DoRA from "DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models" (Liu et al., 2024). DoRA decomposes pretrained weights into magnitude and direction components, then applies LoRA only to the direction.

Key Innovation: Magnitude-Direction Decomposition

Standard LoRA modifies the full weight: W' = W + BA

DoRA decomposes W into magnitude m and direction V:

W = m * (V / ||V||)

Then applies LoRA only to the direction component:

W' = m * ((V + BA) / ||V + BA||)

Where:

m is a learnable magnitude vector [output_size]
V is the original weight direction
BA is the standard LoRA low-rank update
||.|| is column-wise L2 normalization

Why This Works

Separating magnitude from direction gives two benefits:

Direction captures "what" features are important (adapted by LoRA)
Magnitude captures "how much" each feature matters (learned separately)
This mirrors weight normalization, which is known to improve optimization

Architecture

Input x [batch, input_size]
      |
      +---> W * x (frozen base)
      |        |
      +---> A * x -> B * (A * x)     (LoRA delta)
      |        |
      |     V + BA                    (direction update)
      |        |
      |     normalize(V + BA)         (unit direction)
      |        |
      |     m * normalized            (apply magnitude)
      |
      v
Output [batch, output_size]

LoRA+ Note

LoRA+ (Hayou et al., 2024) proposes different learning rates for A vs B matrices. This is a training configuration choice rather than architectural: use a higher learning rate for B (e.g., 5-10x) than for A. We document this recommendation but don't enforce it in the graph structure.

Usage

# Standalone DoRA layer
dora = DoRA.build(input_size: 768, output_size: 768, rank: 8)

# Wrap an existing layer with DoRA
adapted = DoRA.wrap(input, original, rank: 8, name: "dora_attn")

References

Liu et al., "DoRA: Weight-Decomposed Low-Rank Adaptation" (2024)
https://arxiv.org/abs/2402.09353
Hayou et al., "LoRA+: Efficient Low Rank Adaptation of Large Models" (2024)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a standalone DoRA adapter layer.

dora_layer(input, input_size, output_size, opts \\ [])

Build a DoRA layer inline (for use in custom architectures).

output_size(opts \\ [])

Get the output size of a DoRA layer.

recommended_defaults()

Get recommended defaults.

wrap(input, original, opts \\ [])

Wrap an existing dense layer with DoRA adaptation.

Types

build_opt()

@type build_opt() ::
  {:alpha, float()}
  | {:input_size, pos_integer()}
  | {:output_size, pos_integer()}
  | {:rank, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a standalone DoRA adapter layer.

Computes weight-decomposed adaptation: m * normalize(V*x + (alpha/rank)*B(A(x))).

Options

:input_size - Input dimension (required)
:output_size - Output dimension (required)
:rank - Low-rank dimension (default: 8)
:alpha - LoRA scaling factor (default: 16.0)
:name - Layer name prefix (default: "dora")

Returns

An Axon model: [batch, input_size] -> [batch, output_size]

dora_layer(input, input_size, output_size, opts \\ [])

@spec dora_layer(Axon.t(), pos_integer(), pos_integer(), keyword()) :: Axon.t()

Build a DoRA layer inline (for use in custom architectures).

Parameters

input - Axon input node
input_size - Input dimension
output_size - Output dimension

Options

:rank - Low-rank dimension (default: 8)
:alpha - LoRA scaling factor (default: 16.0)
:name - Layer name prefix (default: "dora")

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of a DoRA layer.

recommended_defaults()

@spec recommended_defaults() :: keyword()

Get recommended defaults.

wrap(input, original, opts \\ [])

@spec wrap(Axon.t(), Axon.t(), keyword()) :: Axon.t()

Wrap an existing dense layer with DoRA adaptation.

Parameters

input - The Axon input node
original - The original Axon dense layer output

Options

:output_size - Output dimension (required)
:rank - Low-rank dimension (default: 8)
:alpha - Scaling factor (default: 16.0)
:name - Layer name prefix (default: "dora")