Edifice.SSM.Hymba (Edifice v0.2.0)

Hymba: Hybrid-head Architecture with Parallel Mamba + Attention.

Implements the Hymba architecture from "Hymba: A Hybrid-head Architecture for Small Language Models" (NVIDIA, 2024). Unlike sequential hybrid models (Jamba, Zamba), Hymba runs Mamba and attention in parallel within each block, with learnable gated fusion.

Key Innovations

Parallel Mamba + Attention: Both paths process the same input simultaneously, and outputs are combined via a learnable gate: output = gate * mamba_out + (1 - gate) * attn_out
Learnable Meta Tokens: K learnable vectors prepended to K/V in the attention path. These serve as "summarizers" that compress global context, reducing the effective attention complexity while maintaining long-range access.
Cross-layer meta token propagation: Meta token states are updated across layers, accumulating information throughout the network.

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|         Hymba Block                  |
|                                      |
|  +--------+    +------------------+  |
|  | Mamba   |    | Attention       |  |
|  | (SSM)   |    | + Meta Tokens   |  |
|  +----+----+    +--------+--------+  |
|       |                  |           |
|       v                  v           |
|  gate * mamba + (1-gate) * attn      |
|            |                         |
|            v                         |
|       residual + FFN                 |
+-------------------------------------+
      | (repeat for num_layers)
      v
Output [batch, hidden_size]

Compared to Other Hybrids

Model	Mamba + Attention	Pattern
Jamba	Alternating	Sequential layers
Zamba	Shared attention	Interleaved
Hymba	Parallel heads	Within each block

Usage

model = Hymba.build(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4,
  num_meta_tokens: 4
)

References

Dong et al., "Hymba: A Hybrid-head Architecture for Small Language Models" (NVIDIA, 2024)
https://arxiv.org/abs/2411.13676

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a Hymba model for sequence processing.

default_dropout()

Default dropout rate

default_hidden_size()

Default hidden dimension

default_num_heads()

Default number of attention heads

default_num_layers()

Default number of layers

default_num_meta_tokens()

Default number of learnable meta tokens

default_state_size()

Default SSM state dimension

output_size(opts \\ [])

Get the output size of a Hymba model.

recommended_defaults()

Get recommended defaults.