sLSTM: Scalar LSTM with Exponential Gating.
Standalone extraction of the sLSTM variant from xLSTM. The sLSTM extends traditional LSTM with exponential gating and log-domain stabilization, enabling stable training with very large gate values.
Key Innovation: Exponential Gating with Log-Domain Stabilization
Standard LSTM gates are bounded by sigmoid [0, 1]. sLSTM uses exponential gates that can take any positive value, with a stabilization trick to prevent overflow:
Standard LSTM: i_t = sigmoid(...) ∈ [0, 1]
sLSTM: i_t = exp(log_i_t - m_t) ∈ [0, ∞)The stabilizer m_t = max(log_f_t + m_{t-1}, log_i_t) keeps values
numerically tractable while preserving the relative magnitudes.
Equations
Gate pre-activations (with recurrent connections):
log_i_t = W_i x_t + R_i h_{t-1} + b_ilog_f_t = W_f x_t + R_f h_{t-1} + b_fz_t = tanh(W_z x_t + R_z h_{t-1} + b_z)o_t = sigmoid(W_o x_t + R_o h_{t-1} + b_o)
Log-domain stabilization:
m_t = max(log_f_t + m_{t-1}, log_i_t)i_t' = exp(log_i_t - m_t)f_t' = exp(log_f_t + m_{t-1} - m_t)
State updates:
c_t = f_t' * c_{t-1} + i_t' * z_tn_t = f_t' * n_{t-1} + i_t'h_t = o_t * (c_t / max(|n_t|, 1))
Architecture
Input [batch, seq_len, embed_dim]
|
v
+-------------------------------------+
| sLSTM Block |
| LayerNorm -> sLSTM recurrence |
| LayerNorm -> Feedforward |
| Residual connections |
+-------------------------------------+
| (repeat for num_layers)
v
Output [batch, hidden_size]Usage
model = SLSTM.build(
embed_dim: 287,
hidden_size: 256,
num_layers: 4
)References
- Beck et al., "xLSTM: Extended Long Short-Term Memory" (NeurIPS 2024)
- https://arxiv.org/abs/2405.04517
Summary
Functions
Build a standalone sLSTM model for sequence processing.
Build a standalone sLSTM layer for use in custom architectures.
Default dropout rate
Default feedforward expansion factor
Default hidden dimension
Default number of layers
Get the output size of an sLSTM model.
Get recommended defaults.
Types
@type build_opt() :: {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:expand_factor, pos_integer()} | {:dropout, float()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build a standalone sLSTM model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension (default: 256):num_layers- Number of sLSTM blocks (default: 4):expand_factor- Feedforward expansion factor (default: 2):dropout- Dropout rate (default: 0.0):window_size- Expected sequence length (default: 60)
Returns
An Axon model that processes sequences and outputs the last hidden state.
@spec build_slstm_layer(Axon.t(), pos_integer(), String.t()) :: Axon.t()
Build a standalone sLSTM layer for use in custom architectures.
Returns an Axon node that applies sLSTM recurrence to the input sequence.
@spec default_dropout() :: float()
Default dropout rate
@spec default_expand_factor() :: pos_integer()
Default feedforward expansion factor
@spec default_num_layers() :: pos_integer()
Default number of layers
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of an sLSTM model.
@spec recommended_defaults() :: keyword()
Get recommended defaults.