Edifice.Recurrent.SLSTM (Edifice v0.2.0)

Copy Markdown View Source

sLSTM: Scalar LSTM with Exponential Gating.

Standalone extraction of the sLSTM variant from xLSTM. The sLSTM extends traditional LSTM with exponential gating and log-domain stabilization, enabling stable training with very large gate values.

Key Innovation: Exponential Gating with Log-Domain Stabilization

Standard LSTM gates are bounded by sigmoid [0, 1]. sLSTM uses exponential gates that can take any positive value, with a stabilization trick to prevent overflow:

Standard LSTM:  i_t = sigmoid(...)         [0, 1]
sLSTM:          i_t = exp(log_i_t - m_t)   [0, )

The stabilizer m_t = max(log_f_t + m_{t-1}, log_i_t) keeps values numerically tractable while preserving the relative magnitudes.

Equations

Gate pre-activations (with recurrent connections):

  • log_i_t = W_i x_t + R_i h_{t-1} + b_i
  • log_f_t = W_f x_t + R_f h_{t-1} + b_f
  • z_t = tanh(W_z x_t + R_z h_{t-1} + b_z)
  • o_t = sigmoid(W_o x_t + R_o h_{t-1} + b_o)

Log-domain stabilization:

  • m_t = max(log_f_t + m_{t-1}, log_i_t)
  • i_t' = exp(log_i_t - m_t)
  • f_t' = exp(log_f_t + m_{t-1} - m_t)

State updates:

  • c_t = f_t' * c_{t-1} + i_t' * z_t
  • n_t = f_t' * n_{t-1} + i_t'
  • h_t = o_t * (c_t / max(|n_t|, 1))

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|         sLSTM Block                  |
|  LayerNorm -> sLSTM recurrence       |
|  LayerNorm -> Feedforward            |
|  Residual connections                |
+-------------------------------------+
      | (repeat for num_layers)
      v
Output [batch, hidden_size]

Usage

model = SLSTM.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4
)

References

Summary

Types

Options for build/1.

Functions

Build a standalone sLSTM model for sequence processing.

Build a standalone sLSTM layer for use in custom architectures.

Default dropout rate

Default feedforward expansion factor

Default hidden dimension

Default number of layers

Get the output size of an sLSTM model.

Get recommended defaults.

Types

build_opt()

@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:expand_factor, pos_integer()}
  | {:dropout, float()}
  | {:window_size, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a standalone sLSTM model for sequence processing.

Options

  • :embed_dim - Size of input embedding per frame (required)
  • :hidden_size - Internal hidden dimension (default: 256)
  • :num_layers - Number of sLSTM blocks (default: 4)
  • :expand_factor - Feedforward expansion factor (default: 2)
  • :dropout - Dropout rate (default: 0.0)
  • :window_size - Expected sequence length (default: 60)

Returns

An Axon model that processes sequences and outputs the last hidden state.

build_slstm_layer(input, hidden_size, name \\ "slstm")

@spec build_slstm_layer(Axon.t(), pos_integer(), String.t()) :: Axon.t()

Build a standalone sLSTM layer for use in custom architectures.

Returns an Axon node that applies sLSTM recurrence to the input sequence.

default_dropout()

@spec default_dropout() :: float()

Default dropout rate

default_expand_factor()

@spec default_expand_factor() :: pos_integer()

Default feedforward expansion factor

default_hidden_size()

@spec default_hidden_size() :: pos_integer()

Default hidden dimension

default_num_layers()

@spec default_num_layers() :: pos_integer()

Default number of layers

output_size(opts \\ [])

@spec output_size(keyword()) :: non_neg_integer()

Get the output size of an sLSTM model.