S4D: S4 with Diagonal State Matrix.
A simplified variant of S4 where the state matrix A is purely diagonal, removing the need for the DPLR (Diagonal Plus Low-Rank) decomposition. S4D serves as the bridge between the original S4 (complex HiPPO matrices) and modern SSMs like S5 and Mamba.
Key Simplification
Original S4 uses DPLR decomposition of HiPPO:
A = V * diag(Lambda) * V^{-1} + P * Q^TS4D directly uses diagonal A:
A = diag(a_1, a_2, ..., a_N) (real or complex)This dramatically simplifies implementation while maintaining strong performance on most benchmarks.
Architecture
Identical to S4 but with simpler diagonal-only A parameterization. Each block: LayerNorm -> Diagonal SSM -> Dropout -> Residual -> FFN.
Comparison
| Aspect | S4 | S4D |
|---|---|---|
| A matrix | DPLR (HiPPO) | Pure diagonal |
| Implementation | Complex | Simple |
| Performance | Strong | Nearly identical |
| Parameters | More | Fewer |
Usage
model = S4D.build(
embed_dim: 287,
hidden_size: 256,
state_size: 64,
num_layers: 4
)Reference
- Paper: "On the Parameterization and Initialization of Diagonal State Space Models"
- arXiv: https://arxiv.org/abs/2206.11893
Summary
Functions
Build an S4D model for sequence processing.
Build a single S4D block.
Get the output size of an S4D model.
Calculate approximate parameter count for an S4D model.
Get recommended defaults.
Types
@type build_opt() :: {:dropout, float()} | {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:seq_len, pos_integer()} | {:state_size, pos_integer()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build an S4D model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension (default: 256):state_size- SSM state dimension N (default: 64):num_layers- Number of S4D blocks (default: 4):dropout- Dropout rate (default: 0.1):window_size- Expected sequence length (default: 60)
Returns
An Axon model that outputs [batch, hidden_size] from the last position.
Build a single S4D block.
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of an S4D model.
@spec param_count(keyword()) :: non_neg_integer()
Calculate approximate parameter count for an S4D model.
@spec recommended_defaults() :: keyword()
Get recommended defaults.