xLSTM v2: Improved Extended Long Short-Term Memory.
Implements improvements from the xLSTM 7B scaling paper, building on the original xLSTM architecture with enhanced matrix memory and normalization.
Key Improvements over xLSTM v1
Block-diagonal matrix memory: Reduces mLSTM parameters by partitioning the memory matrix into independent blocks. Each block operates on a subset of dimensions, reducing per-head memory from O(d^2) to O(d^2/B) where B is the number of blocks.
Improved normalizer with learnable bias: The normalizer
n_t = f_t * n_{t-1} + i_tgains a learnable bias term for better gradient flow:h_t = o_t * (c_t / max(|n_t + bias|, 1))Pre-norm + post-norm hybrid: Combines pre-LayerNorm for stable training with post-LayerNorm for better representation quality.
Architecture
Input [batch, seq_len, embed_dim]
|
v
+-------------------------------------+
| xLSTM v2 Block |
| PreNorm -> mLSTM v2 -> PostNorm |
| + Residual |
| PreNorm -> FFN -> PostNorm |
| + Residual |
+-------------------------------------+
| (repeat for num_layers)
v
Output [batch, hidden_size]Usage
model = XLSTMv2.build(
embed_dim: 287,
hidden_size: 256,
num_layers: 4,
num_heads: 4,
num_blocks: 2
)References
- Beck et al., "xLSTM: Extended Long Short-Term Memory" (NeurIPS 2024)
- xLSTM 7B scaling paper improvements
Summary
Functions
Build an xLSTM v2 model for sequence processing.
Default dropout rate
Default feedforward expansion factor
Default head dimension for mLSTM
Default hidden dimension
Default number of memory blocks (block-diagonal)
Default number of heads
Default number of layers
Get the output size of an xLSTM v2 model.
Get recommended defaults.
Types
@type build_opt() :: {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:num_heads, pos_integer()} | {:head_dim, pos_integer()} | {:num_blocks, pos_integer()} | {:expand_factor, pos_integer()} | {:dropout, float()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build an xLSTM v2 model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension (default: 256):num_layers- Number of blocks (default: 4):num_heads- Number of heads for mLSTM (default: 4):head_dim- Dimension per head (default: 64):num_blocks- Number of block-diagonal memory blocks (default: 2):expand_factor- FFN expansion factor (default: 2):dropout- Dropout rate (default: 0.0):window_size- Expected sequence length (default: 60)
Returns
An Axon model that processes sequences and outputs the last hidden state.
@spec default_dropout() :: float()
Default dropout rate
@spec default_expand_factor() :: pos_integer()
Default feedforward expansion factor
@spec default_head_dim() :: pos_integer()
Default head dimension for mLSTM
@spec default_num_blocks() :: pos_integer()
Default number of memory blocks (block-diagonal)
@spec default_num_heads() :: pos_integer()
Default number of heads
@spec default_num_layers() :: pos_integer()
Default number of layers
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of an xLSTM v2 model.
@spec recommended_defaults() :: keyword()
Get recommended defaults.