Hex.pm Hex Docs License

A comprehensive ML architecture library for Elixir, built on Nx and Axon.

186 neural network architectures across 25 families — from MLPs to Mamba, transformers to graph networks, VAEs to spiking neurons, audio codecs to robotics, scientific ML to 3D generation.

Why Edifice?

The Elixir ML ecosystem has excellent numerical computing (Nx) and model building (Axon) foundations, but no comprehensive collection of ready-to-use architectures. Edifice fills that gap:

  • One dependency for all major architecture families
  • Consistent API — every architecture follows Module.build(opts) returning an Axon model
  • Unified registryEdifice.build(:mamba, opts) discovers and builds any architecture by name
  • Pure Elixir — no Python, no ONNX imports, just Nx/Axon all the way down
  • GPU-ready — works with EXLA/CUDA out of the box

Installation

Add edifice to your dependencies in mix.exs:

def deps do
  [
    {:edifice, "~> 0.2.0"}
  ]
end

Edifice requires Nx ~> 0.10 and Axon ~> 0.8. For GPU acceleration, add EXLA:

{:exla, "~> 0.10"}

Tip: On Elixir 1.19+, set MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4 to compile dependencies in parallel (up to 4x faster first build).

Quick Start

# Build any architecture by name
model = Edifice.build(:mamba, embed_size: 256, hidden_size: 512, num_layers: 4)

# Or use the module directly for more control
model = Edifice.SSM.Mamba.build(
  embed_size: 256,
  hidden_size: 512,
  state_size: 16,
  num_layers: 4,
  window_size: 60
)

# Build and run
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 60, 256}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, input)

# Explore what's available
Edifice.list_architectures()
# => [:attention, :bayesian, :capsule, :deep_sets, :densenet, :diffusion, ...]

Edifice.list_families()
# => %{ssm: [:mamba, :mamba_ssd, :s5, ...], attention: [:attention, :retnet, ...], ...}

Architecture Families

Feedforward

ArchitectureModuleKey Feature
MLPEdifice.Feedforward.MLPMulti-layer perceptron with configurable hidden sizes
KANEdifice.Feedforward.KANKolmogorov-Arnold Networks, learnable activation functions
KATEdifice.Feedforward.KATKolmogorov-Arnold Transformer (KAN + attention) (learnable activations)
TabNetEdifice.Feedforward.TabNetAttentive feature selection for tabular data
BitNetEdifice.Feedforward.BitNetTernary/binary weight quantization (1.58-bit)

Transformer

ArchitectureModuleKey Feature
Decoder-OnlyEdifice.Transformer.DecoderOnlyGPT-style with GQA, RoPE/iRoPE, SwiGLU, RMSNorm
Multi-Token PredictionEdifice.Transformer.MultiTokenPredictionPredict next N tokens simultaneously
Byte Latent TransformerEdifice.Transformer.ByteLatentTransformerByte-level processing via encoder-latent-decoder
Nemotron-HEdifice.Transformer.NemotronHNVIDIA's hybrid Mamba-Transformer

State Space Models

ArchitectureModuleKey Feature
S4Edifice.SSM.S4HiPPO DPLR initialization, long-range memory
S4DEdifice.SSM.S4DDiagonal state space, simplified S4
S5Edifice.SSM.S5MIMO diagonal SSM with D skip connection
H3Edifice.SSM.H3Two SSMs with multiplicative gating + short convolution
HyenaEdifice.SSM.HyenaLong convolution hierarchy, implicit filters
MambaEdifice.SSM.MambaSelective SSM, parallel associative scan
Mamba-2 (SSD)Edifice.SSM.MambaSSDStructured state space duality, chunk-wise matmul
Mamba (Cumsum)Edifice.SSM.MambaCumsumMamba with configurable scan algorithm
Mamba (Hillis-Steele)Edifice.SSM.MambaHillisSteeleMamba with max-parallelism scan
BiMambaEdifice.SSM.BiMambaBidirectional Mamba for non-causal tasks
GatedSSMEdifice.SSM.GatedSSMGated temporal with gradient checkpointing
JambaEdifice.SSM.HybridMamba + Attention hybrid (configurable ratio)
ZambaEdifice.SSM.ZambaMamba + single shared attention layer
StripedHyenaEdifice.SSM.StripedHyenaInterleaved Hyena long conv + gated conv
Mamba-3Edifice.SSM.Mamba3Complex states, trapezoidal discretization, MIMO
GSSEdifice.SSM.GSSGated State Space (simplified S4 with gating)
HymbaEdifice.SSM.HymbaHybrid Mamba + attention with learnable meta tokens
SS TransformerEdifice.SSM.SSTransformerState Space Transformer

Attention & Linear Attention

ArchitectureModuleKey Feature
Multi-Head AttentionEdifice.Attention.MultiHeadSliding window, QK LayerNorm
GQAEdifice.Attention.GQAGrouped Query Attention, fewer KV heads
PerceiverEdifice.Attention.PerceiverCross-attention to learned latents, input-agnostic
FNetEdifice.Attention.FNetFourier Transform replacing attention
Linear TransformerEdifice.Attention.LinearTransformerKernel-based O(N) attention
NystromformerEdifice.Attention.NystromformerNystrom approximation of attention matrix
PerformerEdifice.Attention.PerformerFAVOR+ random feature attention
RetNetEdifice.Attention.RetNetMulti-scale retention, O(1) recurrent inference
RWKV-7Edifice.Attention.RWKVLinear attention, O(1) space, "Goose" architecture
GLAEdifice.Attention.GLAGated Linear Attention with data-dependent decay
HGRN-2Edifice.Attention.HGRNHierarchically gated linear RNN, state expansion
Griffin/HawkEdifice.Attention.GriffinRG-LRU + local attention (Griffin) or pure RG-LRU (Hawk)
Diff TransformerEdifice.Attention.DiffTransformerNoise-cancelling dual softmax subtraction
MLAEdifice.Attention.MLAMulti-Head Latent Attention (DeepSeek KV compression)
BasedEdifice.Attention.BasedTaylor expansion linear attention
MegaEdifice.Attention.MegaMoving average + gated attention
InfiniAttentionEdifice.Attention.InfiniAttentionCompressive memory for unbounded context
ConformerEdifice.Attention.ConformerConv-augmented transformer for audio/speech
Ring AttentionEdifice.Attention.RingAttentionDistributed chunked attention for long sequences
Lightning AttentionEdifice.Attention.LightningAttentionHybrid linear/softmax with I/O-aware tiling
Gated AttentionEdifice.Attention.GatedAttentionSigmoid post-attention gate (NeurIPS 2025)
NSAEdifice.Attention.NSANative Sparse Attention (DeepSeek three-path)
KDAEdifice.Attention.KDAKimi Delta Attention, channel-wise decay
Flash Linear AttentionEdifice.Attention.FlashLinearAttentionOptimized linear attention
YaRNEdifice.Attention.YARNRoPE context extension via frequency scaling
Dual ChunkEdifice.Attention.DualChunkDual Chunk Attention for long-context
TMRoPEEdifice.Attention.TMRoPETime-aligned Multimodal RoPE
RNoPE-SWAEdifice.Attention.RNoPESWANo positional encoding + sliding window

Recurrent Networks

ArchitectureModuleKey Feature
LSTM/GRUEdifice.RecurrentClassic recurrent with multi-layer stacking
xLSTMEdifice.Recurrent.XLSTMExponential gating, matrix memory (sLSTM/mLSTM)
MinGRUEdifice.Recurrent.MinGRUMinimal GRU, parallel-scannable
MinLSTMEdifice.Recurrent.MinLSTMMinimal LSTM, parallel-scannable
DeltaNetEdifice.Recurrent.DeltaNetDelta rule-based linear RNN
TTTEdifice.Recurrent.TTTTest-Time Training, self-supervised at inference
TitansEdifice.Recurrent.TitansNeural long-term memory, surprise-gated
ReservoirEdifice.Recurrent.ReservoirEcho State Networks with fixed random reservoir
sLSTMEdifice.Recurrent.SLSTMScalar LSTM with exponential gating
xLSTM v2Edifice.Recurrent.XLSTMv2Updated mLSTM with matrix memory
Gated DeltaNetEdifice.Recurrent.GatedDeltaNetLinear attention with data-dependent gating
TTT-E2EEdifice.Recurrent.TTTE2EEnd-to-end test-time training
Native RecurrenceEdifice.Recurrent.NativeRecurrenceNative recurrence block

Vision

ArchitectureModuleKey Feature
ViTEdifice.Vision.ViTVision Transformer, patch embedding
DeiTEdifice.Vision.DeiTData-efficient ViT with distillation token
SwinEdifice.Vision.SwinTransformerShifted window attention, hierarchical features
U-NetEdifice.Vision.UNetEncoder-decoder with skip connections
ConvNeXtEdifice.Vision.ConvNeXtModernized ConvNet with transformer-inspired design
MLP-MixerEdifice.Vision.MLPMixerPure MLP with token/channel mixing
FocalNetEdifice.Vision.FocalNetFocal modulation, hierarchical context
PoolFormerEdifice.Vision.PoolFormerAverage pooling token mixer (MetaFormer)
NeRFEdifice.Vision.NeRFNeural radiance field, coordinate-to-color mapping
Gaussian SplatEdifice.Vision.GaussianSplat3D Gaussian Splatting (NeRF successor)
MambaVisionEdifice.Vision.MambaVision4-stage hierarchical CNN+Mamba+Attention
DINOv2Edifice.Vision.DINOv2Self-distillation vision backbone
MetaFormerEdifice.Vision.MetaFormerArchitecture-first framework (+ CAFormer variant)
EfficientViTEdifice.Vision.EfficientViTLinear attention ViT

Convolutional

ArchitectureModuleKey Feature
Conv1D/2DEdifice.Convolutional.ConvConfigurable convolution blocks with BN, activation, dropout
ResNetEdifice.Convolutional.ResNetResidual/bottleneck blocks, configurable depth
DenseNetEdifice.Convolutional.DenseNetDense connections, feature reuse
TCNEdifice.Convolutional.TCNDilated causal convolutions for sequences
MobileNetEdifice.Convolutional.MobileNetDepthwise separable convolutions
EfficientNetEdifice.Convolutional.EfficientNetCompound scaling (depth, width, resolution)

Generative Models

ArchitectureModuleKey Feature
VAEEdifice.Generative.VAEReparameterization trick, KL divergence, beta-VAE
VQ-VAEEdifice.Generative.VQVAEDiscrete codebook, straight-through estimator
GANEdifice.Generative.GANGenerator/discriminator, WGAN-GP support
Diffusion (DDPM)Edifice.Generative.DiffusionDenoising diffusion, sinusoidal time embedding
DDIMEdifice.Generative.DDIMDeterministic diffusion sampling, fast inference
DiTEdifice.Generative.DiTDiffusion Transformer, AdaLN-Zero conditioning
Latent DiffusionEdifice.Generative.LatentDiffusionDiffusion in compressed latent space
Consistency ModelEdifice.Generative.ConsistencyModelSingle-step generation via consistency training
Score SDEEdifice.Generative.ScoreSDEContinuous SDE framework (VP-SDE, VE-SDE)
Flow MatchingEdifice.Generative.FlowMatchingODE-based generation, multiple loss variants
Normalizing FlowEdifice.Generative.NormalizingFlowAffine coupling layers (RealNVP-style)
MMDiTEdifice.Generative.MMDiTMultimodal Diffusion Transformer (FLUX.1, SD3)
SoFlowEdifice.Generative.SoFlowFlow matching + consistency loss
VAREdifice.Generative.VARVisual Autoregressive (next-scale prediction)
Linear DiT (SANA)Edifice.Generative.LinearDiTLinear attention for diffusion, 100x speedup
SiTEdifice.Generative.SiTScalable Interpolant Transformer
TransfusionEdifice.Generative.TransfusionUnified AR text + diffusion images
MAREdifice.Generative.MARMasked Autoregressive generation
CogVideoXEdifice.Generative.CogVideoX3D causal VAE + expert transformer for video
TRELLISEdifice.Generative.TRELLISSparse 3D lattice + rectified flow

Contrastive & Self-Supervised

ArchitectureModuleKey Feature
SimCLREdifice.Contrastive.SimCLRNT-Xent contrastive loss, projection head
BYOLEdifice.Contrastive.BYOLNo negatives, momentum encoder
Barlow TwinsEdifice.Contrastive.BarlowTwinsCross-correlation redundancy reduction
MAEEdifice.Contrastive.MAEMasked Autoencoder, 75% patch masking
VICRegEdifice.Contrastive.VICRegVariance-Invariance-Covariance regularization
JEPAEdifice.Contrastive.JEPAJoint Embedding Predictive Architecture
Temporal JEPAEdifice.Contrastive.TemporalJEPAV-JEPA for video/temporal sequences
SigLIPEdifice.Contrastive.SigLIPSigmoid contrastive learning (CLIP improvement)

Graph & Set Networks

ArchitectureModuleKey Feature
GCNEdifice.Graph.GCNSpectral graph convolutions (Kipf & Welling)
GATEdifice.Graph.GATGraph attention with multi-head support
GINEdifice.Graph.GINGraph Isomorphism Network, maximally expressive
GraphSAGEEdifice.Graph.GraphSAGEInductive learning, neighborhood sampling
Graph TransformerEdifice.Graph.GraphTransformerFull attention over nodes with edge features
PNAEdifice.Graph.PNAPrincipal Neighbourhood Aggregation
GINv2Edifice.Graph.GINv2GIN with edge features
SchNetEdifice.Graph.SchNetContinuous-filter convolutions for molecules
EGNNEdifice.Graph.EGNNE(n)-equivariant GNN for molecular simulation
DeepSetsEdifice.Sets.DeepSetsPermutation-invariant set functions
PointNetEdifice.Sets.PointNetPoint cloud processing with T-Net alignment

Energy, Probabilistic & Memory

ArchitectureModuleKey Feature
EBMEdifice.Energy.EBMEnergy-based models, contrastive divergence
HopfieldEdifice.Energy.HopfieldModern continuous Hopfield networks
Neural ODEEdifice.Energy.NeuralODEContinuous-depth networks via ODE solvers
Bayesian NNEdifice.Probabilistic.BayesianWeight uncertainty, variational inference
MC DropoutEdifice.Probabilistic.MCDropoutUncertainty estimation via dropout at inference
Evidential NNEdifice.Probabilistic.EvidentialNNDirichlet priors for uncertainty
NTMEdifice.Memory.NTMNeural Turing Machine, differentiable memory
Memory NetworkEdifice.Memory.MemoryNetworkEnd-to-end memory with multi-hop attention
EngramEdifice.Memory.EngramO(1) hash-based associative memory

Meta-Learning & Specialized

ArchitectureModuleKey Feature
MoEEdifice.Meta.MoEMixture of Experts with top-k/hash routing
Switch MoEEdifice.Meta.SwitchMoETop-1 routing with load balancing
Soft MoEEdifice.Meta.SoftMoEFully differentiable soft token routing
LoRAEdifice.Meta.LoRALow-Rank Adaptation for parameter-efficient fine-tuning
AdapterEdifice.Meta.AdapterBottleneck adapter modules for transfer learning
HypernetworkEdifice.Meta.HypernetworkNetworks that generate other networks' weights
CapsuleEdifice.Meta.CapsuleDynamic routing between capsules
MixtureOfDepthsEdifice.Meta.MixtureOfDepthsDynamic per-token compute allocation
MixtureOfAgentsEdifice.Meta.MixtureOfAgentsMulti-model proposer + aggregator
RLHF HeadEdifice.Meta.RLHFHeadReward model and preference heads
DPOEdifice.Meta.DPODirect Preference Optimization
GRPOEdifice.Meta.GRPOGroup Relative Policy Optimization (DeepSeek-R1)
KTOEdifice.Meta.KTOKahneman-Tversky Optimization (binary feedback)
MoE v2Edifice.Meta.MoEv2Expert-choice routing + shared experts + bias balancing
DoRAEdifice.Meta.DoRAWeight-decomposed LoRA
Speculative DecodingEdifice.Meta.SpeculativeDecodingDraft + verify inference acceleration
Test-Time ComputeEdifice.Meta.TestTimeComputeAdaptive test-time compute
Mixture of TokenizersEdifice.Meta.MixtureOfTokenizersMulti-tokenization expert routing
QATEdifice.Meta.QATQuantization-Aware Training
Hybrid BuilderEdifice.Meta.HybridBuilderConfigurable SSM/Attention ratio
Liquid NNEdifice.LiquidContinuous-time ODE dynamics (LTC cells)
SNNEdifice.Neuromorphic.SNNLeaky integrate-and-fire, surrogate gradients
ANN2SNNEdifice.Neuromorphic.ANN2SNNConvert trained ANNs to spiking networks

Interpretability

ArchitectureModuleKey Feature
Sparse AutoencoderEdifice.Interpretability.SparseAutoencoderFeature extraction from model activations
TranscoderEdifice.Interpretability.TranscoderCross-layer mechanistic interpretability

Scientific ML

ArchitectureModuleKey Feature
FNOEdifice.Scientific.FNOFourier Neural Operator for solving PDEs

Audio

ArchitectureModuleKey Feature
EnCodecEdifice.Audio.EnCodecNeural audio codec (encoder → RVQ → decoder)
VALL-EEdifice.Audio.VALLECodec language model for zero-shot TTS
SoundStormEdifice.Audio.SoundStormParallel audio token generation

Robotics

ArchitectureModuleKey Feature
ACTEdifice.Robotics.ACTAction Chunking Transformer for imitation learning
OpenVLAEdifice.Robotics.OpenVLAVision-Language-Action model for robot control

RL & World Models

ArchitectureModuleKey Feature
PolicyValueEdifice.RL.PolicyValueActor-critic policy-value network
World ModelEdifice.WorldModel.WorldModelEncoder + dynamics + reward head
MedusaEdifice.Inference.MedusaMulti-head speculative decoding

Multimodal

ArchitectureModuleKey Feature
Multimodal FusionEdifice.Multimodal.FusionMLP projection, cross-attention, Perceiver resampler

Building Blocks

BlockModuleKey Feature
RMSNormEdifice.Blocks.RMSNormRoot Mean Square normalization
SwiGLUEdifice.Blocks.SwiGLUGated FFN with SiLU activation
RoPEEdifice.Blocks.RoPERotary position embedding
ALiBiEdifice.Blocks.ALiBiAttention with linear biases
Patch EmbedEdifice.Blocks.PatchEmbedImage-to-patch tokenization
Sinusoidal PEEdifice.Blocks.SinusoidalPEFixed sinusoidal position encoding
Adaptive NormEdifice.Blocks.AdaptiveNormCondition-dependent normalization (AdaLN)
Cross AttentionEdifice.Blocks.CrossAttentionCross-attention between two sequences
Conv1D/2DEdifice.Convolutional.ConvConfigurable convolution blocks
FFNEdifice.Blocks.FFNStandard and gated feed-forward networks
Transformer BlockEdifice.Blocks.TransformerBlockPre-norm block with pluggable attention
Causal MaskEdifice.Blocks.CausalMaskUnified causal mask creation
Depthwise ConvEdifice.Blocks.DepthwiseConv1D depthwise separable convolution
Model BuilderEdifice.Blocks.ModelBuilderSequence/vision model skeletons
Message PassingEdifice.Graph.MessagePassingGeneric MPNN framework, global pooling
Scalable-SoftmaxEdifice.Blocks.SSMaxDrop-in softmax replacement for long sequences
SoftpickEdifice.Blocks.SoftpickNon-saturating sparse attention function
KV CacheEdifice.Blocks.KVCacheInference-time KV caching

Guides

New to ML?

Start here if you're new to machine learning. These guides build from zero to fluency with Edifice's API and architecture families.

  1. ML Foundations — What neural networks are, how they learn, tensors and shapes
  2. Core Vocabulary — Essential terminology used across all guides
  3. The Problem Landscape — Classification, generation, sequence modeling — which architectures solve which problems
  4. Reading Edifice — The build/init/predict pattern, Axon graphs, shapes, and runnable examples
  5. Learning Path — A guided tour through the architecture families

Reference

  • Architecture Taxonomy — Comprehensive catalog of architectures: descriptions, paper references, strengths/weaknesses, adoption context, and gap analysis

Architecture Guides

Conceptual guides covering theory, architecture evolution, and decision tables for each family.

Sequence Processing

Representation Learning

Generative & Dynamic

Composition & Enhancement

Examples

See examples/ for runnable scripts including mlp_basics.exs, sequence_comparison.exs, graph_classification.exs, vae_generation.exs, and architecture_tour.exs.

Mamba for Sequence Modeling

model = Edifice.SSM.Mamba.build(
  embed_size: 128,
  hidden_size: 256,
  state_size: 16,
  num_layers: 4,
  window_size: 100
)

{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 100, 128}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, Nx.broadcast(0.5, {1, 100, 128}))
# => {1, 256}

Graph Classification with GCN

model = Edifice.Graph.GCN.build_classifier(
  input_dim: 16,
  hidden_dims: [64, 64],
  num_classes: 2,
  pool: :mean
)

{init_fn, predict_fn} = Axon.build(model)

params = init_fn.(
  %{
    "nodes" => Nx.template({4, 10, 16}, :f32),
    "adjacency" => Nx.template({4, 10, 10}, :f32)
  },
  Axon.ModelState.empty()
)

output = predict_fn.(params, %{
  "nodes" => Nx.broadcast(0.5, {4, 10, 16}),
  "adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10})
})
# => {4, 2}

VAE with Reparameterization

{encoder, decoder} = Edifice.Generative.VAE.build(
  input_size: 784,
  latent_size: 32,
  encoder_sizes: [512, 256],
  decoder_sizes: [256, 512]
)

# Encoder outputs mu and log_var
{init_fn, predict_fn} = Axon.build(encoder)
params = init_fn.(Nx.template({1, 784}, :f32), Axon.ModelState.empty())
%{mu: mu, log_var: log_var} = predict_fn.(params, Nx.broadcast(0.5, {1, 784}))

# Sample latent vector (requires PRNG key for stochastic sampling)
key = Nx.Random.key(42)
{z, _new_key} = Edifice.Generative.VAE.reparameterize(mu, log_var, key)

# KL divergence for training
kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var)

Permutation-Invariant Set Processing

model = Edifice.Sets.DeepSets.build(
  input_dim: 3,
  hidden_dim: 64,
  output_dim: 10,
  pool: :mean
)

{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({4, 20, 3}, :f32), Axon.ModelState.empty())
# Process sets of 20 3D points
output = predict_fn.(params, Nx.broadcast(0.5, {4, 20, 3}))
# => {4, 10}

API Design

Every architecture module follows the same pattern:

# Module.build(opts) returns an Axon model
model = Edifice.SSM.Mamba.build(embed_size: 256, hidden_size: 512)

# Some modules expose layer-level builders for composition
layer = Edifice.Graph.GCN.gcn_layer(nodes, adjacency, output_dim)

# Generative models may return tuples
{encoder, decoder} = Edifice.Generative.VAE.build(input_size: 784)

# Utility functions for training
loss = Edifice.Generative.VAE.loss(reconstruction, target, mu, log_var)
energy = Edifice.Energy.Hopfield.energy(query, patterns, beta)

The unified registry lets you build any architecture by name:

# Useful for hyperparameter search, config-driven experiments
for arch <- [:mamba, :retnet, :griffin, :gla] do
  model = Edifice.build(arch, embed_size: 256, hidden_size: 512, num_layers: 4)
  # ... train and evaluate
end

Requirements

  • Elixir >= 1.18
  • Nx ~> 0.10
  • Axon ~> 0.8
  • Polaris ~> 0.1
  • EXLA ~> 0.10 (optional, for GPU acceleration)

License

MIT License. See LICENSE for details.