Edifice.SSM.SSTransformer (Edifice v0.2.0)

State Space Transformer — parallel SSM + attention with learned gating per block.

Combines a selective state space model (SSM) path with a multi-head causal attention path in every block, fused via a learned sigmoid gate. This allows the model to dynamically balance local/recurrent processing (SSM) with global attention at each layer.

Architecture

Input [batch, seq_len, embed_dim]
      |
Per block:
  Pre-norm -> SSM path (selective scan with gating)
           -> Attention path (multi-head causal)
           -> gate * ssm_out + (1-gate) * attn_out
           -> FFN + residual
      |
Final norm -> last timestep -> [batch, hidden_size]

Usage

model = SSTransformer.build(
  embed_dim: 256,
  hidden_size: 256,
  state_size: 16,
  num_layers: 6,
  num_heads: 4
)

References

Dao & Gu, "Transformers are SSMs" (2024) — Mamba-2
NVIDIA, "Hymba: A Hybrid-head Architecture" (2024) — parallel gating

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a State Space Transformer model.

output_size(opts \\ [])

Get the output size of the model.

recommended_defaults()

Get recommended defaults.