Edifice.Attention.Mega (Edifice v0.2.0)

Mega: Moving Average Equipped Gated Attention.

Implements the Mega architecture from "Mega: Moving Average Equipped Gated Attention" (Ma et al., ICLR 2023). Mega combines exponential moving averages (EMA) for local context with single-head gated attention for global context, achieving strong performance with sub-quadratic complexity.

Key Innovation: EMA + Gated Attention

Each Mega block has three sub-layers:

EMA sub-layer: Multi-dimensional exponential moving average captures local temporal patterns with learnable decay rates per dimension
Gated attention: Single-head attention with sigmoid gating provides selective global context aggregation
FFN: Standard feed-forward network for feature transformation

Mega Block:
  input -> LayerNorm -> EMA -> residual
        -> LayerNorm -> GatedAttn -> residual
        -> LayerNorm -> FFN -> residual

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-----------------------+
| Input Projection      |
+-----------------------+
      |
      v
+-----------------------+
| Mega Block x N        |
|  EMA Sub-Layer        |
|    alpha = sigmoid(a) |
|    h_t = alpha*h_{t-1}|
|        + (1-alpha)*x_t|
|  Gated Attention      |
|    Q, K, V projections|
|    gate * attn_output |
|  FFN                  |
+-----------------------+
      |
      v
[batch, hidden_size]    (last timestep)

Complexity

Operation	Standard Attention	Mega
Local	O(L^2)	O(L * D_ema) via EMA
Global	O(L^2 * H)	O(L^2) single-head

Usage

model = Mega.build(
  embed_dim: 287,
  hidden_size: 256,
  ema_dim: 16,
  num_layers: 4
)

Reference

Paper: "Mega: Moving Average Equipped Gated Attention"
arXiv: https://arxiv.org/abs/2209.10655

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a Mega model for sequence processing.

build_mega_block(input, opts)

Build a single Mega block with EMA + gated attention + FFN.

output_size(opts \\ [])

Get the output size of a Mega model.

recommended_defaults()

Get recommended defaults.