Edifice.Recurrent.MinLSTM (Edifice v0.2.0)

Minimal LSTM (MinLSTM) - A simplified LSTM that is parallel-scannable.

Implements the MinLSTM from "Were RNNs All We Needed?" (Feng et al., 2024). MinLSTM simplifies the LSTM by removing the output gate and hidden state nonlinearity, keeping only the forget and input gates with a normalization constraint f + i = 1.

Key Innovations

Normalized gates: f_t + i_t = 1 (forget and input gates sum to 1)
No output gate: Cell state IS the hidden state
No hidden-to-hidden in gates: Gates depend only on input
Parallel scannable: The normalized gating admits parallel prefix scan

Equations

f_t = sigmoid(linear_f(x_t))           # Forget gate
i_t = sigmoid(linear_i(x_t))           # Input gate
f'_t = f_t / (f_t + i_t)               # Normalized forget
i'_t = i_t / (f_t + i_t)               # Normalized input
candidate_t = linear_h(x_t)            # Candidate value
c_t = f'_t * c_{t-1} + i'_t * candidate_t  # Cell state = hidden state

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
[Input Projection] -> hidden_size
      |
      v
+---------------------------+
|     MinLSTM Layer         |
|  f = sigmoid(W_f * x)    |
|  i = sigmoid(W_i * x)    |
|  f', i' = normalize(f,i) |
|  c = W_h * x             |
|  h = f'*h + i'*c         |
+---------------------------+
      | (repeat num_layers)
      v
[Layer Norm] -> [Last Timestep]
      |
      v
Output [batch, hidden_size]

Usage

model = MinLSTM.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4,
  dropout: 0.1
)

References

Paper: https://arxiv.org/abs/2410.01201

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a MinLSTM model for sequence processing.

default_dropout()

Default dropout rate

default_hidden_size()

Default hidden dimension

default_num_layers()

Default number of layers

norm_eps()

Normalization epsilon

output_size(opts \\ [])

Get the output size of a MinLSTM model.