Edifice.Recurrent.XLSTMv2 (Edifice v0.2.0)

Copy Markdown View Source

xLSTM v2: Improved Extended Long Short-Term Memory.

Implements improvements from the xLSTM 7B scaling paper, building on the original xLSTM architecture with enhanced matrix memory and normalization.

Key Improvements over xLSTM v1

  1. Block-diagonal matrix memory: Reduces mLSTM parameters by partitioning the memory matrix into independent blocks. Each block operates on a subset of dimensions, reducing per-head memory from O(d^2) to O(d^2/B) where B is the number of blocks.

  2. Improved normalizer with learnable bias: The normalizer n_t = f_t * n_{t-1} + i_t gains a learnable bias term for better gradient flow: h_t = o_t * (c_t / max(|n_t + bias|, 1))

  3. Pre-norm + post-norm hybrid: Combines pre-LayerNorm for stable training with post-LayerNorm for better representation quality.

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|     xLSTM v2 Block                   |
|  PreNorm -> mLSTM v2 -> PostNorm     |
|       + Residual                     |
|  PreNorm -> FFN -> PostNorm          |
|       + Residual                     |
+-------------------------------------+
      | (repeat for num_layers)
      v
Output [batch, hidden_size]

Usage

model = XLSTMv2.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4,
  num_heads: 4,
  num_blocks: 2
)

References

  • Beck et al., "xLSTM: Extended Long Short-Term Memory" (NeurIPS 2024)
  • xLSTM 7B scaling paper improvements

Summary

Types

Options for build/1.

Functions

Build an xLSTM v2 model for sequence processing.

Default dropout rate

Default feedforward expansion factor

Default head dimension for mLSTM

Default hidden dimension

Default number of memory blocks (block-diagonal)

Default number of heads

Default number of layers

Get the output size of an xLSTM v2 model.

Get recommended defaults.

Types

build_opt()

@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:head_dim, pos_integer()}
  | {:num_blocks, pos_integer()}
  | {:expand_factor, pos_integer()}
  | {:dropout, float()}
  | {:window_size, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build an xLSTM v2 model for sequence processing.

Options

  • :embed_dim - Size of input embedding per frame (required)
  • :hidden_size - Internal hidden dimension (default: 256)
  • :num_layers - Number of blocks (default: 4)
  • :num_heads - Number of heads for mLSTM (default: 4)
  • :head_dim - Dimension per head (default: 64)
  • :num_blocks - Number of block-diagonal memory blocks (default: 2)
  • :expand_factor - FFN expansion factor (default: 2)
  • :dropout - Dropout rate (default: 0.0)
  • :window_size - Expected sequence length (default: 60)

Returns

An Axon model that processes sequences and outputs the last hidden state.

default_dropout()

@spec default_dropout() :: float()

Default dropout rate

default_expand_factor()

@spec default_expand_factor() :: pos_integer()

Default feedforward expansion factor

default_head_dim()

@spec default_head_dim() :: pos_integer()

Default head dimension for mLSTM

default_hidden_size()

@spec default_hidden_size() :: pos_integer()

Default hidden dimension

default_num_blocks()

@spec default_num_blocks() :: pos_integer()

Default number of memory blocks (block-diagonal)

default_num_heads()

@spec default_num_heads() :: pos_integer()

Default number of heads

default_num_layers()

@spec default_num_layers() :: pos_integer()

Default number of layers

output_size(opts \\ [])

@spec output_size(keyword()) :: non_neg_integer()

Get the output size of an xLSTM v2 model.