# `Edifice.SSM.Hymba`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/ssm/hymba.ex#L1)

Hymba: Hybrid-head Architecture with Parallel Mamba + Attention.

Implements the Hymba architecture from "Hymba: A Hybrid-head Architecture
for Small Language Models" (NVIDIA, 2024). Unlike sequential hybrid models
(Jamba, Zamba), Hymba runs Mamba and attention **in parallel** within each
block, with learnable gated fusion.

## Key Innovations

1. **Parallel Mamba + Attention**: Both paths process the same input
   simultaneously, and outputs are combined via a learnable gate:
   `output = gate * mamba_out + (1 - gate) * attn_out`

2. **Learnable Meta Tokens**: K learnable vectors prepended to K/V in
   the attention path. These serve as "summarizers" that compress global
   context, reducing the effective attention complexity while maintaining
   long-range access.

3. **Cross-layer meta token propagation**: Meta token states are updated
   across layers, accumulating information throughout the network.

## Architecture

```
Input [batch, seq_len, embed_dim]
      |
      v
+-------------------------------------+
|         Hymba Block                  |
|                                      |
|  +--------+    +------------------+  |
|  | Mamba   |    | Attention       |  |
|  | (SSM)   |    | + Meta Tokens   |  |
|  +----+----+    +--------+--------+  |
|       |                  |           |
|       v                  v           |
|  gate * mamba + (1-gate) * attn      |
|            |                         |
|            v                         |
|       residual + FFN                 |
+-------------------------------------+
      | (repeat for num_layers)
      v
Output [batch, hidden_size]
```

## Compared to Other Hybrids

| Model | Mamba + Attention | Pattern |
|-------|-------------------|---------|
| Jamba | Alternating | Sequential layers |
| Zamba | Shared attention | Interleaved |
| Hymba | Parallel heads | Within each block |

## Usage

    model = Hymba.build(
      embed_dim: 287,
      hidden_size: 256,
      num_layers: 4,
      num_meta_tokens: 4
    )

## References

- Dong et al., "Hymba: A Hybrid-head Architecture for Small Language Models"
  (NVIDIA, 2024)
- https://arxiv.org/abs/2411.13676

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:state_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_meta_tokens, pos_integer()}
  | {:dropout, float()}
  | {:window_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Hymba model for sequence processing.

## Options
  - `:embed_dim` - Size of input embedding per frame (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:state_size` - SSM state dimension (default: 16)
  - `:num_layers` - Number of Hymba blocks (default: 4)
  - `:num_heads` - Number of attention heads (default: 4)
  - `:num_meta_tokens` - Learnable meta tokens for attention (default: 4)
  - `:dropout` - Dropout rate (default: 0.0)
  - `:window_size` - Expected sequence length (default: 60)

## Returns
  An Axon model that processes sequences and outputs the last hidden state.

# `default_dropout`

```elixir
@spec default_dropout() :: float()
```

Default dropout rate

# `default_hidden_size`

```elixir
@spec default_hidden_size() :: pos_integer()
```

Default hidden dimension

# `default_num_heads`

```elixir
@spec default_num_heads() :: pos_integer()
```

Default number of attention heads

# `default_num_layers`

```elixir
@spec default_num_layers() :: pos_integer()
```

Default number of layers

# `default_num_meta_tokens`

```elixir
@spec default_num_meta_tokens() :: pos_integer()
```

Default number of learnable meta tokens

# `default_state_size`

```elixir
@spec default_state_size() :: pos_integer()
```

Default SSM state dimension

# `output_size`

```elixir
@spec output_size(keyword()) :: non_neg_integer()
```

Get the output size of a Hymba model.

# `recommended_defaults`

```elixir
@spec recommended_defaults() :: keyword()
```

Get recommended defaults.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
