Edifice.Meta.DoRA (Edifice v0.2.0)

Copy Markdown View Source

DoRA: Weight-Decomposed Low-Rank Adaptation.

Implements DoRA from "DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models" (Liu et al., 2024). DoRA decomposes pretrained weights into magnitude and direction components, then applies LoRA only to the direction.

Key Innovation: Magnitude-Direction Decomposition

Standard LoRA modifies the full weight: W' = W + BA

DoRA decomposes W into magnitude m and direction V:

W = m * (V / ||V||)

Then applies LoRA only to the direction component:

W' = m * ((V + BA) / ||V + BA||)

Where:

  • m is a learnable magnitude vector [output_size]
  • V is the original weight direction
  • BA is the standard LoRA low-rank update
  • ||.|| is column-wise L2 normalization

Why This Works

Separating magnitude from direction gives two benefits:

  1. Direction captures "what" features are important (adapted by LoRA)
  2. Magnitude captures "how much" each feature matters (learned separately)
  3. This mirrors weight normalization, which is known to improve optimization

Architecture

Input x [batch, input_size]
      |
      +---> W * x (frozen base)
      |        |
      +---> A * x -> B * (A * x)     (LoRA delta)
      |        |
      |     V + BA                    (direction update)
      |        |
      |     normalize(V + BA)         (unit direction)
      |        |
      |     m * normalized            (apply magnitude)
      |
      v
Output [batch, output_size]

LoRA+ Note

LoRA+ (Hayou et al., 2024) proposes different learning rates for A vs B matrices. This is a training configuration choice rather than architectural: use a higher learning rate for B (e.g., 5-10x) than for A. We document this recommendation but don't enforce it in the graph structure.

Usage

# Standalone DoRA layer
dora = DoRA.build(input_size: 768, output_size: 768, rank: 8)

# Wrap an existing layer with DoRA
adapted = DoRA.wrap(input, original, rank: 8, name: "dora_attn")

References

  • Liu et al., "DoRA: Weight-Decomposed Low-Rank Adaptation" (2024)
  • https://arxiv.org/abs/2402.09353
  • Hayou et al., "LoRA+: Efficient Low Rank Adaptation of Large Models" (2024)

Summary

Types

Options for build/1.

Functions

Build a standalone DoRA adapter layer.

Build a DoRA layer inline (for use in custom architectures).

Get the output size of a DoRA layer.

Get recommended defaults.

Wrap an existing dense layer with DoRA adaptation.

Types

build_opt()

@type build_opt() ::
  {:alpha, float()}
  | {:input_size, pos_integer()}
  | {:output_size, pos_integer()}
  | {:rank, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a standalone DoRA adapter layer.

Computes weight-decomposed adaptation: m * normalize(V*x + (alpha/rank)*B(A(x))).

Options

  • :input_size - Input dimension (required)
  • :output_size - Output dimension (required)
  • :rank - Low-rank dimension (default: 8)
  • :alpha - LoRA scaling factor (default: 16.0)
  • :name - Layer name prefix (default: "dora")

Returns

An Axon model: [batch, input_size] -> [batch, output_size]

dora_layer(input, input_size, output_size, opts \\ [])

@spec dora_layer(Axon.t(), pos_integer(), pos_integer(), keyword()) :: Axon.t()

Build a DoRA layer inline (for use in custom architectures).

Parameters

  • input - Axon input node
  • input_size - Input dimension
  • output_size - Output dimension

Options

  • :rank - Low-rank dimension (default: 8)
  • :alpha - LoRA scaling factor (default: 16.0)
  • :name - Layer name prefix (default: "dora")

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of a DoRA layer.

wrap(input, original, opts \\ [])

@spec wrap(Axon.t(), Axon.t(), keyword()) :: Axon.t()

Wrap an existing dense layer with DoRA adaptation.

Parameters

  • input - The Axon input node
  • original - The original Axon dense layer output

Options

  • :output_size - Output dimension (required)
  • :rank - Low-rank dimension (default: 8)
  • :alpha - Scaling factor (default: 16.0)
  • :name - Layer name prefix (default: "dora")