Edifice.Attention.LightningAttention (Edifice v0.2.0)

Copy Markdown View Source

Lightning Attention — hybrid linear/softmax block attention.

Splits the sequence into fixed-size blocks and uses two complementary attention mechanisms:

  • Intra-block: Standard softmax attention within each block (O(B²) per block)
  • Inter-block: Linear attention via cumulative KV state across blocks (O(B·d) per block)

This achieves near-linear overall complexity while retaining the expressivity of softmax attention at the local level.

Architecture

Input [batch, seq_len, embed_dim]
      |
Input Projection to hidden_size
      |
+--------------------------------------------+
|  Lightning Attention Block (x num_layers)  |
|                                            |
|  LayerNorm -> Q,K,V projections           |
|  Reshape to [batch, heads, blocks, B, d]  |
|                                            |
|  Intra-block: softmax(Q_b @ K_b^T) @ V_b |
|  Inter-block: Q_b @ cumsum(K_j^T V_j)    |
|  Output = intra + inter                   |
|                                            |
|  -> Residual                              |
|  LayerNorm -> FFN -> Residual             |
+--------------------------------------------+
      |
Final LayerNorm
      |
Last timestep -> [batch, hidden_size]

Constraints

seq_len must be divisible by block_size.

Usage

model = LightningAttention.build(
  embed_dim: 287,
  hidden_size: 256,
  num_heads: 8,
  num_layers: 4,
  block_size: 64
)

References

  • Qin et al., "Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models" (2024)

Summary

Types

Options for build/1.

Functions

Build a Lightning Attention model.

Build the lightning attention sublayer.

Get the output size of the model.

Types

build_opt()

@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:block_size, pos_integer()}
  | {:dropout, float()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a Lightning Attention model.

Options

  • :embed_dim - Input embedding dimension (required)
  • :hidden_size - Internal hidden dimension (default: 256)
  • :num_heads - Number of attention heads (default: 8)
  • :num_layers - Number of Lightning Attention blocks (default: 4)
  • :block_size - Block size B for chunked attention (default: 64). seq_len must be divisible by this value.
  • :dropout - Dropout rate (default: 0.1)
  • :seq_len / :window_size - Expected sequence length (default: 60)

Returns

An Axon model outputting [batch, hidden_size].

build_lightning_attention(input, opts)

@spec build_lightning_attention(
  Axon.t(),
  keyword()
) :: Axon.t()

Build the lightning attention sublayer.

This creates the core attention mechanism with both intra-block (softmax) and inter-block (linear) attention pathways.

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of the model.