Configurable Hybrid Builder — flexible hybrid architecture composition.
A meta-module that builds hybrid sequence models with arbitrary layer
schedules, going beyond the fixed attention_every: N pattern. Supports
ratio-based specification, explicit schedules, parallel SSM+attention
blocks, and multi-backbone mixing.
Scheduling Modes
| Mode | Option | Example |
|---|---|---|
| Ratio | ratio: {9, 1} | 9 backbone : 1 attention |
| Every-N | attention_every: 4 | Same as existing Hybrid |
| Explicit | schedule: [:mamba, :mamba, :attn, ...] | Full control |
| Parallel | mode: :parallel | SSM+attention in parallel per block |
Architecture (Interleaved Mode)
Input [batch, seq_len, embed_dim]
│
┌─────┴──────────────────────────────┐
│ schedule[0] block │ backbone or attention
├────────────────────────────────────┤
│ schedule[1] block │ backbone or attention
├────────────────────────────────────┤
│ ... │
└────────────────────────────────────┘
│
[batch, hidden_size]Architecture (Parallel Mode)
Input [batch, seq_len, embed_dim]
│
┌─────┴──────────────────────────────┐
│ Per block: │
│ norm(x) → SSM path │
│ norm(x) → Attention path │
│ gate * ssm + (1 - gate) * attn │
│ + FFN + residual │
└────────────────────────────────────┘
│
[batch, hidden_size]Usage
# Ratio-based: 90% Mamba, 10% attention (10 layers → 9 Mamba + 1 attn)
model = HybridBuilder.build(
embed_dim: 256,
hidden_size: 256,
num_layers: 10,
ratio: {9, 1}
)
# Explicit schedule with multi-backbone
model = HybridBuilder.build(
embed_dim: 256,
hidden_size: 256,
schedule: [:mamba, :mamba, :gru, :attn, :mamba, :mamba, :gru, :attn]
)
# Parallel mode (Hymba-style, all layers have both SSM + attention)
model = HybridBuilder.build(
embed_dim: 256,
hidden_size: 256,
num_layers: 6,
mode: :parallel
)References
- Jamba (AI21, 2024) — sequential Mamba+attention interleaving
- Zamba (Zyphra, 2024) — shared attention layer
- Hymba (NVIDIA, 2024) — parallel Mamba+attention per block
- Nemotron-H (NVIDIA, 2025) — 90:10 SSM:attention ratio
Summary
Functions
Build a configurable hybrid model.
Describe the layer pattern for a given configuration.
Get the output size of the model.
Get recommended defaults for common hybrid patterns.
Get recommended defaults.
Resolve the layer schedule from options.
Types
@type build_opt() :: {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:backbone, atom()} | {:mode, :interleaved | :parallel} | {:schedule, [atom()]} | {:ratio, {pos_integer(), pos_integer()}} | {:attention_every, pos_integer()} | {:state_size, pos_integer()} | {:expand_factor, pos_integer()} | {:conv_size, pos_integer()} | {:num_heads, pos_integer()} | {:head_dim, pos_integer()} | {:window_size, pos_integer()} | {:dropout, float()} | {:seq_len, pos_integer()}
Options for build/1.
Functions
Build a configurable hybrid model.
Options
Scheduling (mutually exclusive — first match wins):
:schedule- Explicit layer schedule as list of atoms. Valid entries::attn,:mamba,:gru,:rwkv,:delta_net,:gated_delta_net,:griffin_lru.:ratio-{backbone_count, attn_count}tuple. Layers are distributed so attention is evenly spaced. E.g.,{9, 1}with 10 layers → 9 backbone + 1 attn.:attention_every- Insert attention every N layers (fallback, same as Hybrid).
Mode:
:mode-:interleaved(default) or:parallel. Parallel mode runs SSM + attention in every block with learned gating.
Architecture:
:embed_dim- Input embedding dimension (required):hidden_size- Internal hidden dimension (default: 256):num_layers- Total number of layers (default: 6). Ignored if:scheduleis given.:backbone- Default backbone type (default::mamba)
SSM-specific (for Mamba backbone):
:state_size- SSM state dimension (default: 16):expand_factor- Mamba expansion factor (default: 2):conv_size- Causal conv kernel size (default: 4)
Attention-specific:
:num_heads- Number of attention heads (default: 4):head_dim- Dimension per attention head (default: 64):window_size- Attention window size (default: 60)
General:
:dropout- Dropout rate (default: 0.1):seq_len- Fixed sequence length (default: window_size)
Returns
An Axon model outputting [batch, hidden_size].
Describe the layer pattern for a given configuration.
Useful for debugging and visualization.
Examples
iex> HybridBuilder.describe_schedule(num_layers: 10, ratio: {9, 1})
%{
schedule: [:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn],
num_backbone: 9,
num_attention: 1,
backbone_pct: 90.0,
mode: :interleaved
}
@spec output_size(keyword()) :: pos_integer()
Get the output size of the model.
Get recommended defaults for common hybrid patterns.
Patterns
:nemotron_h— 90:10 Mamba:attention (Nemotron-H style):jamba— 3:1 interleaved (Jamba style):parallel— Hymba-style parallel in every block:minimal_attn— Backbone-heavy with rare attention
Examples
HybridBuilder.build(HybridBuilder.preset(:nemotron_h) ++ [embed_dim: 256])
@spec recommended_defaults() :: keyword()
Get recommended defaults.
Resolve the layer schedule from options.
Priority: :schedule > :ratio > :attention_every > default (3:1).
Examples
iex> HybridBuilder.resolve_schedule(schedule: [:mamba, :mamba, :attn])
[:mamba, :mamba, :attn]
iex> HybridBuilder.resolve_schedule(num_layers: 10, ratio: {9, 1}, backbone: :mamba)
[:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn]
iex> HybridBuilder.resolve_schedule(num_layers: 6, attention_every: 3, backbone: :gru)
[:gru, :gru, :attn, :gru, :gru, :attn]