Edifice.Meta.HybridBuilder (Edifice v0.2.0)

Copy Markdown View Source

Configurable Hybrid Builder — flexible hybrid architecture composition.

A meta-module that builds hybrid sequence models with arbitrary layer schedules, going beyond the fixed attention_every: N pattern. Supports ratio-based specification, explicit schedules, parallel SSM+attention blocks, and multi-backbone mixing.

Scheduling Modes

ModeOptionExample
Ratioratio: {9, 1}9 backbone : 1 attention
Every-Nattention_every: 4Same as existing Hybrid
Explicitschedule: [:mamba, :mamba, :attn, ...]Full control
Parallelmode: :parallelSSM+attention in parallel per block

Architecture (Interleaved Mode)

Input [batch, seq_len, embed_dim]
      

  schedule[0] block                    backbone or attention

  schedule[1] block                    backbone or attention

  ...                                

      
[batch, hidden_size]

Architecture (Parallel Mode)

Input [batch, seq_len, embed_dim]
      

  Per block:                         
    norm(x)  SSM path              
    norm(x)  Attention path        
    gate * ssm + (1 - gate) * attn  
    + FFN + residual                

      
[batch, hidden_size]

Usage

# Ratio-based: 90% Mamba, 10% attention (10 layers → 9 Mamba + 1 attn)
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  num_layers: 10,
  ratio: {9, 1}
)

# Explicit schedule with multi-backbone
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  schedule: [:mamba, :mamba, :gru, :attn, :mamba, :mamba, :gru, :attn]
)

# Parallel mode (Hymba-style, all layers have both SSM + attention)
model = HybridBuilder.build(
  embed_dim: 256,
  hidden_size: 256,
  num_layers: 6,
  mode: :parallel
)

References

  • Jamba (AI21, 2024) — sequential Mamba+attention interleaving
  • Zamba (Zyphra, 2024) — shared attention layer
  • Hymba (NVIDIA, 2024) — parallel Mamba+attention per block
  • Nemotron-H (NVIDIA, 2025) — 90:10 SSM:attention ratio

Summary

Types

Options for build/1.

Functions

Build a configurable hybrid model.

Describe the layer pattern for a given configuration.

Get the output size of the model.

Get recommended defaults for common hybrid patterns.

Get recommended defaults.

Resolve the layer schedule from options.

Types

build_opt()

@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:backbone, atom()}
  | {:mode, :interleaved | :parallel}
  | {:schedule, [atom()]}
  | {:ratio, {pos_integer(), pos_integer()}}
  | {:attention_every, pos_integer()}
  | {:state_size, pos_integer()}
  | {:expand_factor, pos_integer()}
  | {:conv_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:head_dim, pos_integer()}
  | {:window_size, pos_integer()}
  | {:dropout, float()}
  | {:seq_len, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a configurable hybrid model.

Options

Scheduling (mutually exclusive — first match wins):

  • :schedule - Explicit layer schedule as list of atoms. Valid entries: :attn, :mamba, :gru, :rwkv, :delta_net, :gated_delta_net, :griffin_lru.
  • :ratio - {backbone_count, attn_count} tuple. Layers are distributed so attention is evenly spaced. E.g., {9, 1} with 10 layers → 9 backbone + 1 attn.
  • :attention_every - Insert attention every N layers (fallback, same as Hybrid).

Mode:

  • :mode - :interleaved (default) or :parallel. Parallel mode runs SSM + attention in every block with learned gating.

Architecture:

  • :embed_dim - Input embedding dimension (required)
  • :hidden_size - Internal hidden dimension (default: 256)
  • :num_layers - Total number of layers (default: 6). Ignored if :schedule is given.
  • :backbone - Default backbone type (default: :mamba)

SSM-specific (for Mamba backbone):

  • :state_size - SSM state dimension (default: 16)
  • :expand_factor - Mamba expansion factor (default: 2)
  • :conv_size - Causal conv kernel size (default: 4)

Attention-specific:

  • :num_heads - Number of attention heads (default: 4)
  • :head_dim - Dimension per attention head (default: 64)
  • :window_size - Attention window size (default: 60)

General:

  • :dropout - Dropout rate (default: 0.1)
  • :seq_len - Fixed sequence length (default: window_size)

Returns

An Axon model outputting [batch, hidden_size].

describe_schedule(opts)

@spec describe_schedule(keyword()) :: map()

Describe the layer pattern for a given configuration.

Useful for debugging and visualization.

Examples

iex> HybridBuilder.describe_schedule(num_layers: 10, ratio: {9, 1})
%{
  schedule: [:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn],
  num_backbone: 9,
  num_attention: 1,
  backbone_pct: 90.0,
  mode: :interleaved
}

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of the model.

preset(atom)

@spec preset(atom()) :: keyword()

Get recommended defaults for common hybrid patterns.

Patterns

  • :nemotron_h — 90:10 Mamba:attention (Nemotron-H style)
  • :jamba — 3:1 interleaved (Jamba style)
  • :parallel — Hymba-style parallel in every block
  • :minimal_attn — Backbone-heavy with rare attention

Examples

HybridBuilder.build(HybridBuilder.preset(:nemotron_h) ++ [embed_dim: 256])

resolve_schedule(opts)

@spec resolve_schedule(keyword()) :: [atom()]

Resolve the layer schedule from options.

Priority: :schedule > :ratio > :attention_every > default (3:1).

Examples

iex> HybridBuilder.resolve_schedule(schedule: [:mamba, :mamba, :attn])
[:mamba, :mamba, :attn]

iex> HybridBuilder.resolve_schedule(num_layers: 10, ratio: {9, 1}, backbone: :mamba)
[:mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :mamba, :attn]

iex> HybridBuilder.resolve_schedule(num_layers: 6, attention_every: 3, backbone: :gru)
[:gru, :gru, :attn, :gru, :gru, :attn]