Edifice.Attention.HGRN (Edifice v0.2.0)

HGRN-2: Hierarchically Gated Linear RNN with State Expansion.

HGRN-2 is a linear RNN architecture that uses hierarchical gating and state expansion to achieve strong performance on sequence modeling tasks while maintaining O(L) complexity.

Key Innovation: State Expansion

HGRN-2 expands the hidden state dimension during recurrence, then contracts back. This allows the model to maintain a richer internal representation without increasing output complexity:

h_expanded = expand(h)  # D -> D*expansion
h_new = gate * h_expanded + (1 - gate) * input
output = contract(h_new)  # D*expansion -> D

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|  HGRN-2 Block                        |
|                                      |
|  +- State Expansion ---------------+ |
|  |                               |   |
|  |  h_expanded = Linear(h, D*E)  |   |
|  |                               |   |
|  +-------------------------------+   |
|                                      |
|  +- Hierarchical Gating -----------+ |
|  |                               |   |
|  |  forget_gate = sigmoid(Wf*x)  |   |
|  |  input_gate = sigmoid(Wi*x)   |   |
|  |  h = f*h + i*input            |   |
|  |                               |   |
|  +-------------------------------+   |
|                                      |
|  +- State Contraction -------------+ |
|  |                               |   |
|  |  output = Linear(h, D)        |   |
|  |                               |   |
|  +-------------------------------+   |
+-------------------------------------+
      | (repeat for num_layers)
      v
[batch, hidden_size]

Complexity

Aspect	Value
Training Time	O(L)
Training Space	O(L)
Inference Time	O(1) per step
Inference Space	O(1)

Usage

model = HGRN.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 6,
  state_expansion: 2
)

Reference

Paper: "HGRN2: Gated Linear RNNs with State Expansion" (arXiv:2404.07904)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build an HGRN-2 model for sequence processing.

build_hgrn_block(input, opts)

Build a single HGRN-2 block.

build_hgrn_layer(input, opts)

Build the Hierarchical Gated RNN layer with state expansion.

init_cache(opts \\ [])

Initialize hidden state for O(1) incremental inference.

output_size(opts \\ [])

Get the output size of an HGRN model.

param_count(opts)

Calculate approximate parameter count for an HGRN model.

recommended_defaults()

Recommended default configuration for sequence processing.