Edifice.Blocks.ModelBuilder (Edifice v0.2.0)

Copy Markdown View Source

High-level model building utilities for sequence and vision architectures.

Provides standardized model skeletons that handle input creation, projection, block stacking, final normalization, and output extraction. Architecture-specific logic is provided via block builder callbacks.

Sequence Model

Input [batch, seq_len, embed_dim]
  -> Optional projection to hidden_size
  -> Stack N blocks (via block_builder callback)
  -> Final LayerNorm
  -> Output extraction (last_timestep / all / mean_pool)

Vision Model

Input [batch, channels, height, width]
  -> Patch embedding
  -> Stack N blocks (via block_builder callback)
  -> Final LayerNorm
  -> Pooling (cls_token / mean_pool)
  -> Optional classifier head

Usage

# Build a sequence model with custom blocks
model = ModelBuilder.build_sequence_model(
  embed_dim: 287,
  hidden_size: 256,
  num_layers: 4,
  block_builder: fn input, opts -> MyBlock.layer(input, opts) end
)

Design

Generalizes the pattern from Edifice.SSM.Common.build_model/2 to work with any block type (SSM, attention, MLP mixer, etc.).

Summary

Functions

Build a sequence processing model.

Build a vision model with patch embedding.

Functions

build_sequence_model(opts)

@spec build_sequence_model(keyword()) :: Axon.t()

Build a sequence processing model.

Options

  • :embed_dim - Input embedding dimension (required)
  • :hidden_size - Internal hidden dimension (default: embed_dim)
  • :num_layers - Number of blocks to stack (required)
  • :block_builder - Function (input, opts) -> Axon.t() that builds one block (required)
  • :seq_len - Expected sequence length for JIT optimization (default: 60)
  • :output_mode - Output extraction: :last_timestep, :all, :mean_pool (default: :last_timestep)
  • :final_norm - Whether to apply final layer norm (default: true)
  • :dropout - Dropout rate between blocks (default: 0.0)

Returns

An Axon model. Output shape depends on :output_mode:

  • :last_timestep -> [batch, hidden_size]
  • :all -> [batch, seq_len, hidden_size]
  • :mean_pool -> [batch, hidden_size]

build_vision_model(opts)

@spec build_vision_model(keyword()) :: Axon.t()

Build a vision model with patch embedding.

Options

  • :image_size - Input image size (square, default: 224)
  • :patch_size - Patch size (default: 16)
  • :in_channels - Number of input channels (default: 3)
  • :hidden_size - Hidden dimension (required)
  • :num_layers - Number of blocks to stack (required)
  • :block_builder - Function (input, opts) -> Axon.t() that builds one block (required)
  • :num_classes - If provided, adds a classifier head
  • :output_mode - Pooling mode: :mean_pool, :cls_token (default: :mean_pool)
  • :final_norm - Whether to apply final layer norm (default: true)

Returns

An Axon model outputting [batch, hidden_size] or [batch, num_classes].