Edifice.Blocks.ModelBuilder (Edifice v0.2.0)

High-level model building utilities for sequence and vision architectures.

Provides standardized model skeletons that handle input creation, projection, block stacking, final normalization, and output extraction. Architecture-specific logic is provided via block builder callbacks.

Sequence Model

Input [batch, seq_len, embed_dim]
  -> Optional projection to hidden_size
  -> Stack N blocks (via block_builder callback)
  -> Final LayerNorm
  -> Output extraction (last_timestep / all / mean_pool)

Vision Model

Input [batch, channels, height, width]
  -> Patch embedding
  -> Stack N blocks (via block_builder callback)
  -> Final LayerNorm
  -> Pooling (cls_token / mean_pool)
  -> Optional classifier head

Usage

# Build a sequence model with custom blocks
model = ModelBuilder.build_sequence_model(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4,
  block_builder: fn input, opts -> MyBlock.layer(input, opts) end
)

Design

Generalizes the pattern from Edifice.SSM.Common.build_model/2 to work with any block type (SSM, attention, MLP mixer, etc.).

Summary

Functions

build_sequence_model(opts)

Build a sequence processing model.

build_vision_model(opts)

Build a vision model with patch embedding.

Functions

build_sequence_model(opts)

@spec build_sequence_model(keyword()) :: Axon.t()

Build a sequence processing model.

Options

:embed_dim - Input embedding dimension (required)
:hidden_size - Internal hidden dimension (default: embed_dim)
:num_layers - Number of blocks to stack (required)
:block_builder - Function (input, opts) -> Axon.t() that builds one block (required)
:seq_len - Expected sequence length for JIT optimization (default: 60)
:output_mode - Output extraction: :last_timestep, :all, :mean_pool (default: :last_timestep)
:final_norm - Whether to apply final layer norm (default: true)
:dropout - Dropout rate between blocks (default: 0.0)

Returns

An Axon model. Output shape depends on :output_mode:

:last_timestep -> [batch, hidden_size]
:all -> [batch, seq_len, hidden_size]
:mean_pool -> [batch, hidden_size]

build_vision_model(opts)

@spec build_vision_model(keyword()) :: Axon.t()

Build a vision model with patch embedding.

Options

:image_size - Input image size (square, default: 224)
:patch_size - Patch size (default: 16)
:in_channels - Number of input channels (default: 3)
:hidden_size - Hidden dimension (required)
:num_layers - Number of blocks to stack (required)
:block_builder - Function (input, opts) -> Axon.t() that builds one block (required)
:num_classes - If provided, adds a classifier head
:output_mode - Pooling mode: :mean_pool, :cls_token (default: :mean_pool)
:final_norm - Whether to apply final layer norm (default: true)

Returns

An Axon model outputting [batch, hidden_size] or [batch, num_classes].