Edifice.Recurrent.SLSTM (Edifice v0.2.0)

sLSTM: Scalar LSTM with Exponential Gating.

Standalone extraction of the sLSTM variant from xLSTM. The sLSTM extends traditional LSTM with exponential gating and log-domain stabilization, enabling stable training with very large gate values.

Key Innovation: Exponential Gating with Log-Domain Stabilization

Standard LSTM gates are bounded by sigmoid [0, 1]. sLSTM uses exponential gates that can take any positive value, with a stabilization trick to prevent overflow:

Standard LSTM:  i_t = sigmoid(...)        ∈ [0, 1]
sLSTM:          i_t = exp(log_i_t - m_t)  ∈ [0, ∞)

The stabilizer m_t = max(log_f_t + m_{t-1}, log_i_t) keeps values numerically tractable while preserving the relative magnitudes.

Equations

Gate pre-activations (with recurrent connections):

log_i_t = W_i x_t + R_i h_{t-1} + b_i
log_f_t = W_f x_t + R_f h_{t-1} + b_f
z_t = tanh(W_z x_t + R_z h_{t-1} + b_z)
o_t = sigmoid(W_o x_t + R_o h_{t-1} + b_o)

Log-domain stabilization:

m_t = max(log_f_t + m_{t-1}, log_i_t)
i_t' = exp(log_i_t - m_t)
f_t' = exp(log_f_t + m_{t-1} - m_t)

State updates:

c_t = f_t' * c_{t-1} + i_t' * z_t
n_t = f_t' * n_{t-1} + i_t'
h_t = o_t * (c_t / max(|n_t|, 1))

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|         sLSTM Block                  |
|  LayerNorm -> sLSTM recurrence       |
|  LayerNorm -> Feedforward            |
|  Residual connections                |
+-------------------------------------+
      | (repeat for num_layers)
      v
Output [batch, hidden_size]

Usage

model = SLSTM.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4
)

References

Beck et al., "xLSTM: Extended Long Short-Term Memory" (NeurIPS 2024)
https://arxiv.org/abs/2405.04517

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a standalone sLSTM model for sequence processing.

build_slstm_layer(input, hidden_size, name \\ "slstm")

Build a standalone sLSTM layer for use in custom architectures.

default_dropout()

Default dropout rate

default_expand_factor()

Default feedforward expansion factor

default_hidden_size()

Default hidden dimension

default_num_layers()

Default number of layers

output_size(opts \\ [])

Get the output size of an sLSTM model.

recommended_defaults()

Get recommended defaults.