All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.2.0] - 2026-02-25

Added

186 registered architectures across 25 families (up from 92 across 16 in v0.1.0). 94 new architectures grouped by family:

  • Attention (35 total, +20): Hawk, RetNet v2, Megalodon, Lightning Attention, GLA v2, HGRN v2, Flash Linear Attention, KDA (Kernelized Deformable Attention), Gated Attention, SSMax (Scalable-Softmax), Softpick (non-saturating sparse normalization), RNoPE-SWA (sliding window without positional encoding), YaRN (context window extension via frequency-scaled RoPE), NSA (Native Sparse Attention from DeepSeek-V3/V4), TMRoPE (Time-aligned Multimodal RoPE), Dual Chunk Attention, Based (Taylor expansion linear attention), InfiniAttention (compressive memory + local attention), Conformer (conv + transformer for audio), Mega (EMA + single-head gated attention), RingAttention (chunked ring-distributed), MLA (Multi-Head Latent Attention), DiffTransformer (dual softmax noise-cancelling)
  • Audio (3, NEW family): SoundStorm (parallel audio generation via masked prediction), EnCodec (neural audio codec), VALL-E (zero-shot TTS via neural codec language modeling)
  • Contrastive (8, +3): JEPA (Joint Embedding Predictive Architecture), Temporal JEPA, SigLIP (sigmoid contrastive loss for language-image pretraining)
  • Generative (22, +11): MMDiT (multi-modal DiT), SoFlow, VAR (Visual Autoregressive Modeling, NeurIPS 2024 Best Paper), Linear DiT/SANA (DiT with linear attention), SiT (Scalable Interpolant Transformer), Transfusion (unified AR text + diffusion image), MAR (Masked Autoregressive Generation), CogVideoX (text-to-video diffusion with 3D causal VAE), TRELLIS (structured 3D latents with sparse transformer + rectified flow), DiT v2, Consistency Model
  • Graph (9, +2): GIN v2 (GIN with edge features), EGNN (E(n)-equivariant graph neural network)
  • Inference (1, NEW family): Medusa (multi-head speculative decoding for 2-3x speedup)
  • Interpretability (2, NEW family): Sparse Autoencoder, Transcoder
  • Memory (3, +1): Engram (O(1) hash-based associative memory via locality-sensitive hashing)
  • Meta (22, +11): DPO (Direct Preference Optimization), KTO (Kahneman-Tversky Optimization), GRPO (Group Relative Policy Optimization), MoE v2 (aux-loss-free load balancing), DoRA, Speculative Decoding, Test-Time Compute, Mixture of Tokenizers, Speculative Head, Distillation Head, QAT (Quantization-Aware Training), Hybrid Builder (flexible hybrid architecture composition), MixtureOfDepths, MixtureOfAgents, RLHFHead
  • Multimodal (1, NEW family): Multimodal Fusion
  • Recurrent (15, +7): sLSTM, xLSTM v2, Gated DeltaNet, TTT-E2E (end-to-end test-time training), Native Recurrence, plus previously added recurrent variants
  • RL (1, NEW family): PolicyValue
  • Robotics (2, NEW family): ACT (Action Chunking Transformer for robot imitation learning), OpenVLA (Vision-Language-Action model)
  • Scientific (1, NEW family): FNO (Fourier Neural Operator)
  • SSM (19, +5): StripedHyena (gated conv + Hyena hybrid), Mamba-3 (complex state dynamics, trapezoidal discretization, MIMO rank-r), GSS (Gated State Spaces), Hyena v2, Hymba, SS Transformer
  • Transformer (4, NEW family): Decoder-Only (GPT-style with GQA, RoPE, SwiGLU, RMSNorm), Multi-Token Prediction, Byte Latent Transformer, Nemotron-H (NVIDIA's hybrid Mamba-Transformer)
  • Vision (15, +9): FocalNet (focal modulation), PoolFormer (pooling-based MetaFormer), NeRF (positional encoding + MLP for radiance fields), Gaussian Splatting (real-time differentiable radiance field rendering), MambaVision, DINOv2 (self-supervised vision backbone via self-distillation), MetaFormer + CAFormer (pluggable token mixer framework), EfficientViT (O(n) linear attention with cascaded group attention)
  • World Model (1, NEW family): World Model
  • Feedforward: KAT (Kolmogorov-Arnold Transformer), BitNet (ternary/binary weight quantization)
  • Blocks: CausalMask (unified mask creation), DepthwiseConv (1D depthwise separable convolution)

Infrastructure and tooling:

  • GGUF export for decoder-only models
  • KV cache for inference
  • Quantization toolkit (QAT module)
  • shell.nix for reproducible Erlang 27 + Elixir 1.18 + CUDA dev environment
  • livebook.sh script for attached-mode Livebook with EXLA/CUDA
  • ARCHITECTURE_ROADMAP.md tracking remaining architectures by priority tier
  • .credo.exs configuration and CONTRIBUTING.md with architecture addition guide

Notebooks (12 Livebook notebooks):

  • Architecture zoo guided tour
  • Architecture comparison (decision boundaries)
  • Sequence modeling (RNN vs SSM vs Transformer)
  • MLP training end-to-end walkthrough
  • Graph classification (GCN vs GAT vs GIN)
  • Generative models (VAE)
  • Small language model (Transformer + Mamba char-level LM)
  • Liquid neural networks
  • LM architecture shootout
  • Softmax shootout (Softmax vs SSMax vs Softpick)
  • Guided tour demo with detailed ML explanations
  • Notebook index with descriptions and categories

Documentation:

  • 18 conceptual guide documents (up from 12) covering architecture taxonomy, ML foundations, learning path, meta-learning, and reading Edifice source
  • Architecture landscape survey and research docs
  • 100% moduledoc coverage across all 211 modules
  • 100% @spec coverage on all public functions
  • Typed @type build_opt for all build/1 modules

Benchmarks:

  • Full architecture sweep benchmark covering all families
  • Training throughput and memory profile benchmarks
  • GPU runtime warmup phases for accurate measurements

Testing:

  • 2822+ tests (up from ~1160 in v0.1.0)
  • Gradient smoke tests with JIT-wrapped value_and_grad
  • Parameter sensitivity tests and EXLA.Backend variants for conv models
  • Dialyzer added to CI, zero warnings enforced

Enhanced

  • Decoder-Only transformer: added :attention_type option, iRoPE (interleaved RoPE) support
  • MultiHead and GQA attention: added :rope option for built-in RoPE integration
  • TTT (Test-Time Training): added :variant option for :linear and :mlp inner models
  • TransformerBlock: added :custom_ffn callback for non-standard feed-forward networks
  • xLSTM: added :mlstm registry alias (Edifice.build(:mlstm, opts))
  • sLSTM: log-domain stabilization (mt state), recurrent connections (R*h{t-1}), proper normalization (max(|n_t|, 1))
  • MoE v2: aux-loss-free load balancing via bias mode
  • DiffTransformer: simplified V2 with scalar lambda and RMSNorm only
  • Liquid Neural Networks: exact analytical ODE solver added, set as default
  • API option names normalized across all modules for consistency

Changed

  • Removed unnecessary require Axon from 104 modules (Axon has zero macros)
  • BitNet bitlinear_impl comment clarified: STE is implicit via Axon's param/callback architecture
  • Removed broken sliding_window registry alias
  • Dependency constraints tightened to match tested versions
  • All 72 Credo warnings resolved across 41 files
  • All Dialyzer errors resolved; strict formatting enforced
  • Notebooks default to 10 epochs with EXLA optional and dual setup cells (standalone / attached mode)

Fixed

  • EnCodec: channels-first bug fixed across all conv/conv_transpose layers
  • Gaussian Splatting: render pipeline rewritten for JIT/EXLA compatibility; render_layer arity mismatch resolved
  • Gradient smoke tests: JIT-wrapped value_and_grad for conv model gradients; put_nested no longer destroys sibling params
  • MessagePassing: aggregate batch axes added to Nx.dot for correct batched matrix multiplication; global_pool refactored for 100% coverage
  • RetNet: corrected recurrent_retention_step batching
  • TTT: paper-faithful initialization for numerical stability
  • FNet: replaced Nx.fft with real DFT matrix multiply for compatibility; Nx.real taken after each FFT to avoid complex intermediates
  • RWKV: fixed seq_len=1 compile failure; silenced Range warnings in parallel scans
  • sLSTM: log-domain stabilization for numerical stability
  • MoE routing: top-k uses Nx.top_k with one-hot mask; hash routing properly selects expert; Switch MoE uses straight-through top-1 selection
  • Paper-faithfulness corrections across 8 architecture modules
  • 5 GPU test failures resolved in capsule and conv gradient tests
  • VAE training fixed (single Axon graph); graph viz range bug resolved
  • FocalNet bench spec corrected to match flat-architecture API

[0.1.1] - 2026-02-14

Fixed

  • MoE top-k routing: top_k_forward now uses Nx.top_k indices with one-hot mask for correct expert selection (was ignoring indices and averaging first k experts)
  • MoE hash routing: hash_forward now properly selects expert by hash (was always returning first expert)
  • SwitchMoE routing: Replaced soft weighted average with hard top-1 selection via straight-through estimator, restoring the sparsity that defines Switch Transformer
  • SchNet filter generation: Added learned 2-layer filter-generating network (RBF -> Dense -> SiLU -> Dense) replacing naive mean aggregation
  • ConvNeXt layer scale: Changed from frozen constant to learnable Axon.param, matching Liu et al. 2022
  • MessagePassing aggregate: Added batch axes to Nx.dot for correct batched matrix multiplication
  • SNN docstring: Corrected reset mechanism description from hard reset to soft reset (subtract threshold)

Changed

  • KAN default basis: Changed from :sine (Fourier features) to :bspline (cubic B-spline via Cox-de Boor), faithful to Liu et al. 2024. Previous bases (:sine, :chebyshev, :fourier, :rbf) remain available as options
  • TTT W_0 initialization: Changed from 0.01 * Identity to :glorot_uniform per Sun et al. 2024
  • TTT output RMS norm: Made optional via :output_rms_norm option (default: false), was unconditionally applied

Removed

  • Unused _x and _dt parameters from Liquid integrate_ode

[0.1.0] - 2026-02-14

Added

  • 92 registered architectures across 16 families
  • Unified interface: Edifice.build(:name, opts) and Edifice.list_architectures()
  • Feedforward: MLP, KAN (Kolmogorov-Arnold Networks), TabNet
  • Convolutional: Conv1D/2D, ResNet, DenseNet, TCN, MobileNet, EfficientNet
  • Recurrent: LSTM, GRU, xLSTM, MinGRU, MinLSTM, DeltaNet, TTT, Titans, Reservoir (ESN)
  • State Space Models: Mamba (parallel scan), Mamba-2 (SSD), MambaCumsum, MambaHillisSteele, S4, S4D, S5, H3, Hyena, BiMamba, GatedSSM, Jamba, Zamba
  • Attention: Multi-Head (sliding window, hybrid), GQA, Perceiver, FNet, LinearTransformer, Nystromformer, Performer, RetNet, RWKV-7, GLA, HGRN-2, Griffin/Hawk
  • Vision: ViT, DeiT, Swin Transformer, U-Net, ConvNeXt, MLP-Mixer
  • Generative: VAE, VQ-VAE, GAN (WGAN-GP), DDPM Diffusion, DDIM, DiT, Latent Diffusion, Consistency Model, Score SDE, Flow Matching, Normalizing Flows
  • Contrastive: SimCLR, BYOL, Barlow Twins, MAE (Masked Autoencoder), VICReg
  • Graph: GCN, GAT, GIN, GraphSAGE, Graph Transformer, PNA, SchNet
  • Sets: DeepSets, PointNet
  • Energy: EBM (contrastive divergence), Modern Hopfield Networks, Neural ODE
  • Probabilistic: Bayesian Neural Networks, MC Dropout, Evidential Neural Networks
  • Memory: Neural Turing Machine, Memory Networks
  • Meta: MoE, Switch MoE, Soft MoE, LoRA, Adapter, Hypernetworks, Capsule Networks
  • Liquid: Liquid Neural Networks (continuous-time ODE)
  • Neuromorphic: SNN (LIF neurons), ANN2SNN conversion
  • Building Blocks: RMSNorm, SwiGLU, FFN, RoPE, ALiBi, PatchEmbed, SinusoidalPE, AdaptiveNorm, CrossAttention
  • 12 conceptual guide documents covering theory, evolution, and decision tables for all families
  • CONTRIBUTING.md with architecture addition guide, test patterns, and Nx/Axon gotchas
  • ~1160 tests covering all architecture families