# `Edifice.Transformer.NemotronH`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/transformer/nemotron_h.ex#L1)

Nemotron-H: NVIDIA's Hybrid Mamba-Transformer Architecture.

Nemotron-H is a hybrid language model that combines 90% Mamba2 (SSD) layers with
10% full attention layers. This design achieves Transformer-level quality while
maintaining linear inference cost from the SSM components.

## Key Innovation: Hybrid Layer Mixing

Rather than using all-attention or all-SSM, Nemotron-H interleaves them:
- 90% of layers use Mamba2 (State Space Duality) for efficient linear-time processing
- 10% of layers use full multi-head attention for global reasoning
- Attention blocks placed at regular intervals (every 10th layer by default)

## Architecture

```
Input [batch, seq_len, embed_dim]
      |
      v
[Shared Embedding Projection]
      |
      v
+========================================+
|            Layer 0 (Mamba2)            |
|  RMSNorm -> Mamba2 SSD -> Residual     |
|  RMSNorm -> SwiGLU FFN -> Residual     |
+========================================+
      |
     ... (Mamba2 layers 1-8)
      |
+========================================+
|           Layer 9 (Attention)          |
|  RMSNorm -> MultiHead Attn -> Residual |
|  RMSNorm -> SwiGLU FFN -> Residual     |
+========================================+
      |
     ... (pattern repeats)
      |
      v
[Final RMSNorm]
      |
      v
[Output Projection (tied weights)]
      |
      v
Output [batch, hidden_dim]
```

## Mamba2 (SSD) Blocks

Use State Space Duality from Mamba-2:
- Chunked matmul for tensor core utilization
- Selective state space with input-dependent parameters
- Depthwise convolution + gating

## Attention Blocks

Standard multi-head attention with:
- Grouped Query Attention (optional)
- RoPE position embeddings (optional)
- Causal masking

## Usage

    model = NemotronH.build(
      embed_dim: 287,
      hidden_dim: 2048,
      num_layers: 32,
      attention_every_n: 10,
      num_heads: 16
    )

## References

- Paper: "Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer
  Language Models" (NVIDIA, 2025)
- Mamba-2: "Transformers are SSMs" (Gu & Dao, 2024)

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_dim, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:attention_every_n, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_kv_heads, pos_integer()}
  | {:mamba_d_state, pos_integer()}
  | {:mamba_d_conv, pos_integer()}
  | {:mamba_expand, pos_integer()}
  | {:dropout, float()}
  | {:rope, boolean()}
  | {:window_size, pos_integer()}
  | {:seq_len, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Nemotron-H hybrid model.

## Options

  - `:embed_dim` - Input embedding dimension (required)
  - `:hidden_dim` - Model hidden dimension (default: 2048)
  - `:num_layers` - Total number of layers (default: 32)
  - `:attention_every_n` - Place attention at every Nth layer (default: 10)
  - `:num_heads` - Number of attention heads (default: 16)
  - `:num_kv_heads` - Number of KV heads for GQA (default: 4)
  - `:mamba_d_state` - Mamba SSM state dimension (default: 64)
  - `:mamba_d_conv` - Mamba convolution kernel size (default: 4)
  - `:mamba_expand` - Mamba expansion factor (default: 2)
  - `:dropout` - Dropout rate (default: 0.0)
  - `:rope` - Apply RoPE to attention layers (default: false)
  - `:window_size` / `:seq_len` - Expected sequence length (default: 60)

## Returns

  An Axon model that outputs `[batch, hidden_dim]`.

# `build_attention_block`

```elixir
@spec build_attention_block(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build an attention block with RMSNorm and SwiGLU FFN.

Architecture: RMSNorm -> MultiHead Attention -> (residual handled by caller)
              RMSNorm -> SwiGLU FFN -> (residual handled by caller)

## Options

  Same as `build/1`, plus `:layer_idx` for naming.

# `build_mamba_block`

```elixir
@spec build_mamba_block(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build a Mamba2 (SSD) block with RMSNorm and SwiGLU FFN.

Architecture: RMSNorm -> Mamba2 -> (no residual here, handled by caller)
              RMSNorm -> SwiGLU FFN -> (no residual here)

## Options

  Same as `build/1`, plus `:layer_idx` for naming.

# `nemotron_block`

```elixir
@spec nemotron_block(Axon.t(), non_neg_integer(), keyword()) :: Axon.t()
```

Build a single Nemotron-H block.

Dispatches to either Mamba2 or attention based on the block index.
Attention blocks are placed at positions where `block_idx % attention_every_n == (attention_every_n - 1)`.

## Parameters

  - `input` - Input Axon node
  - `block_idx` - 0-indexed block position
  - `opts` - Model options

## Returns

  Block output (before residual connection).

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of a Nemotron-H model.

# `param_count`

```elixir
@spec param_count(keyword()) :: pos_integer()
```

Calculate approximate parameter count for a Nemotron-H model.

# `recommended_defaults`

```elixir
@spec recommended_defaults() :: keyword()
```

Recommended default configuration for Nemotron-H.

# `small_config`

```elixir
@spec small_config() :: keyword()
```

Get small model configuration (for testing/prototyping).

---

*Consult [api-reference.md](api-reference.md) for complete listing*
