S5: Simplified State Space Sequence model.
S5 uses a single multi-input, multi-output (MIMO) state space model instead of the many independent SISO systems used in Mamba. This results in a simpler architecture while maintaining strong performance.
Key Innovation: MIMO SSM
Instead of having many parallel single-input single-output SSMs (like Mamba), S5 uses one large MIMO SSM:
Mamba: D separate SSMs, each with state size N
S5: 1 large SSM with D*N combined stateArchitecture
Input [batch, seq_len, embed_dim]
│
▼
┌─────────────────────────────────────┐
│ S5 Block │
│ │
│ Linear projection → Encoder │
│ │
│ ┌─ MIMO SSM ────────────────────┐ │
│ │ │ │
│ │ x'(t) = Ax(t) + Bu(t) │ │
│ │ y(t) = Cx(t) + Du(t) │ │
│ │ │ │
│ │ (Diagonal A for efficiency) │ │
│ │ │ │
│ └───────────────────────────────┘ │
│ │
│ Decoder → Linear projection │
│ │
└─────────────────────────────────────┘
│ (repeat for num_layers)
▼
[batch, hidden_size]Complexity
| Aspect | Value |
|---|---|
| Training | O(L log L) via FFT or O(L) via scan |
| Inference | O(1) per step |
| Parameters | Fewer than Mamba |
Key Difference from Mamba
| Aspect | S5 | Mamba |
|---|---|---|
| SSM structure | MIMO | Many SISOs |
| Input-dependence | Fixed A, B, C | Selective (input-dependent) |
| Complexity | Simpler | More complex |
| Gating | Optional | SiLU gating |
Usage
model = S5.build(
embed_dim: 287,
hidden_size: 256,
state_size: 64,
num_layers: 4
)Use Case
S5 is useful for ablation studies to understand what Mamba's added complexity (selective mechanism, gating) contributes.
Reference
- Paper: "Simplified State Space Layers for Sequence Modeling" (ICLR 2023)
- arXiv: 2208.04933
Summary
Functions
Build an S5 model for sequence processing.
Build the Feed-Forward Network layer.
Build the MIMO SSM layer.
Build a single S5 block.
Initialize hidden state for O(1) incremental inference.
Get the output size of an S5 model.
Calculate approximate parameter count for an S5 model.
Get recommended defaults for real-time sequence processing (60fps).
Types
@type build_opt() :: {:dropout, float()} | {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:seq_len, pos_integer()} | {:state_size, pos_integer()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build an S5 model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension D (default: 256):state_size- SSM state dimension N (default: 64):num_layers- Number of S5 blocks (default: 4):dropout- Dropout rate (default: 0.1):window_size- Expected sequence length for JIT optimization (default: 60)
Returns
An Axon model that outputs [batch, hidden_size] from the last position.
Build the Feed-Forward Network layer.
Build the MIMO SSM layer.
Key components:
- Encoder projection
- State space model (diagonal A for efficiency)
- Decoder projection
Build a single S5 block.
Each block has:
- MIMO SSM layer
- Feed-forward network
Initialize hidden state for O(1) incremental inference.
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of an S5 model.
@spec param_count(keyword()) :: non_neg_integer()
Calculate approximate parameter count for an S5 model.
@spec recommended_defaults() :: keyword()
Get recommended defaults for real-time sequence processing (60fps).