Edifice.Meta.HybridBuilder (Edifice v0.2.0)

Configurable Hybrid Builder — flexible hybrid architecture composition.

A meta-module that builds hybrid sequence models with arbitrary layer schedules, going beyond the fixed attention_every: N pattern. Supports ratio-based specification, explicit schedules, parallel SSM+attention blocks, and multi-backbone mixing.

Scheduling Modes

Mode	Option	Example
Ratio	`ratio: {9, 1}`	9 backbone : 1 attention
Every-N	`attention_every: 4`	Same as existing Hybrid
Explicit	`schedule: [:mamba, :mamba, :attn, ...]`	Full control
Parallel	`mode: :parallel`	SSM+attention in parallel per block

Architecture (Interleaved Mode)

Input [batch, seq_len, embed_dim]
      │
┌─────┴──────────────────────────────┐
│  schedule[0] block                  │  backbone or attention
├────────────────────────────────────┤
│  schedule[1] block                  │  backbone or attention
├────────────────────────────────────┤
│  ...                                │
└────────────────────────────────────┘
      │
[batch, hidden_size]

Architecture (Parallel Mode)

Input [batch, seq_len, embed_dim]
      │
┌─────┴──────────────────────────────┐
│  Per block:                         │
│    norm(x) → SSM path              │
│    norm(x) → Attention path        │
│    gate * ssm + (1 - gate) * attn  │
│    + FFN + residual                │
└────────────────────────────────────┘
      │
[batch, hidden_size]

Usage

# Ratio-based: 90% Mamba, 10% attention (10 layers → 9 Mamba + 1 attn)
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  num_layers: 10,
  ratio: {9, 1}
)

# Explicit schedule with multi-backbone
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  schedule: [:mamba, :mamba, :gru, :attn, :mamba, :mamba, :gru, :attn]
)

# Parallel mode (Hymba-style, all layers have both SSM + attention)
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  num_layers: 6,
  mode: :parallel
)

References

Jamba (AI21, 2024) — sequential Mamba+attention interleaving
Zamba (Zyphra, 2024) — shared attention layer
Hymba (NVIDIA, 2024) — parallel Mamba+attention per block
Nemotron-H (NVIDIA, 2025) — 90:10 SSM:attention ratio

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a configurable hybrid model.

describe_schedule(opts)

Describe the layer pattern for a given configuration.

output_size(opts \\ [])

Get the output size of the model.

preset(atom)

Get recommended defaults for common hybrid patterns.

recommended_defaults()

Get recommended defaults.

resolve_schedule(opts)

Resolve the layer schedule from options.

Types

build_opt()

@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:backbone, atom()}
  | {:mode, :interleaved | :parallel}
  | {:schedule, [atom()]}
  | {:ratio, {pos_integer(), pos_integer()}}
  | {:attention_every, pos_integer()}
  | {:state_size, pos_integer()}
  | {:expand_factor, pos_integer()}
  | {:conv_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:head_dim, pos_integer()}
  | {:window_size, pos_integer()}
  | {:dropout, float()}
  | {:seq_len, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a configurable hybrid model.

Options

Scheduling (mutually exclusive — first match wins):

:schedule - Explicit layer schedule as list of atoms. Valid entries: :attn, :mamba, :gru, :rwkv, :delta_net, :gated_delta_net, :griffin_lru.
:ratio - {backbone_count, attn_count} tuple. Layers are distributed so attention is evenly spaced. E.g., {9, 1} with 10 layers → 9 backbone + 1 attn.
:attention_every - Insert attention every N layers (fallback, same as Hybrid).

Mode:

:mode - :interleaved (default) or :parallel. Parallel mode runs SSM + attention in every block with learned gating.

Architecture:

:embed_dim - Input embedding dimension (required)
:hidden_size - Internal hidden dimension (default: 256)
:num_layers - Total number of layers (default: 6). Ignored if :schedule is given.
:backbone - Default backbone type (default: :mamba)

SSM-specific (for Mamba backbone):

:state_size - SSM state dimension (default: 16)
:expand_factor - Mamba expansion factor (default: 2)
:conv_size - Causal conv kernel size (default: 4)

Attention-specific:

:num_heads - Number of attention heads (default: 4)
:head_dim - Dimension per attention head (default: 64)
:window_size - Attention window size (default: 60)

General:

:dropout - Dropout rate (default: 0.1)
:seq_len - Fixed sequence length (default: window_size)

Returns

An Axon model outputting [batch, hidden_size].

describe_schedule(opts)

@spec describe_schedule(keyword()) :: map()

Describe the layer pattern for a given configuration.

Useful for debugging and visualization.

Examples

iex> HybridBuilder.describe_schedule(num_layers: 10, ratio: {9, 1})
%{
  schedule: [:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn],
  num_backbone: 9,
  num_attention: 1,
  backbone_pct: 90.0,
  mode: :interleaved
}

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of the model.

preset(atom)

@spec preset(atom()) :: keyword()

Get recommended defaults for common hybrid patterns.

Patterns

:nemotron_h — 90:10 Mamba:attention (Nemotron-H style)
:jamba — 3:1 interleaved (Jamba style)
:parallel — Hymba-style parallel in every block
:minimal_attn — Backbone-heavy with rare attention

Examples

HybridBuilder.build(HybridBuilder.preset(:nemotron_h) ++ [embed_dim: 256])

recommended_defaults()

@spec recommended_defaults() :: keyword()

Get recommended defaults.

resolve_schedule(opts)

@spec resolve_schedule(keyword()) :: [atom()]

Resolve the layer schedule from options.

Priority: :schedule > :ratio > :attention_every > default (3:1).

Examples

iex> HybridBuilder.resolve_schedule(schedule: [:mamba, :mamba, :attn])
[:mamba, :mamba, :attn]

iex> HybridBuilder.resolve_schedule(num_layers: 10, ratio: {9, 1}, backbone: :mamba)
[:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn]

iex> HybridBuilder.resolve_schedule(num_layers: 6, attention_every: 3, backbone: :gru)
[:gru, :gru, :attn, :gru, :gru, :attn]