# Edifice v0.2.0 - Table of Contents 186 neural network architectures for Nx/Axon: transformers, Mamba, diffusion, GNNs, audio, robotics, and more ## Pages - [Edifice](readme.md) - [Changelog](changelog.md) - [LICENSE](license.md) - Getting Started - [ML Foundations](ml_foundations.md) - [Core Vocabulary](core_vocabulary.md) - [The Problem Landscape](problem_landscape.md) - [Reading Edifice](reading_edifice.md) - [Learning Path](learning_path.md) - Reference - [Architecture Taxonomy](architecture_taxonomy.md) - Guides: Sequence Processing - [State Space Models](state_space_models.md) - [Attention Mechanisms](attention_mechanisms.md) - [Recurrent Networks](recurrent_networks.md) - Guides: Representation Learning - [Vision Architectures](vision_architectures.md) - [Convolutional Networks](convolutional_networks.md) - [Contrastive and Self-Supervised Learning](contrastive_learning.md) - [Graph and Set Networks](graph_and_set_networks.md) - Guides: Generative & Dynamic - [Generative Models](generative_models.md) - [Dynamic and Continuous Architectures](dynamic_and_continuous.md) - Guides: Composition & Enhancement - [Building Blocks](building_blocks.md) - [Meta-Learning and Conditional Computation](meta_learning.md) - [Uncertainty, Memory, and Feedforward Foundations](uncertainty_and_memory.md) ## Modules - [Edifice](Edifice.md): Edifice - A comprehensive ML architecture library for Elixir. - [Edifice.Attention.Based](Edifice.Attention.Based.md): Based: Linear attention with Taylor expansion feature map. - [Edifice.Attention.Conformer](Edifice.Attention.Conformer.md): Conformer: convolution-augmented transformer for audio/speech processing. - [Edifice.Attention.DiffTransformer](Edifice.Attention.DiffTransformer.md): Differential Transformer V2: simplified noise-cancelling attention. - [Edifice.Attention.DualChunk](Edifice.Attention.DualChunk.md): Dual Chunk Attention — context extension via intra-chunk and inter-chunk attention. - [Edifice.Attention.FlashLinearAttention](Edifice.Attention.FlashLinearAttention.md): Flash Linear Attention — chunked linear attention with feature maps. - [Edifice.Attention.GLAv2](Edifice.Attention.GLAv2.md): GLA v2: Improved Gated Linear Attention. - [Edifice.Attention.GatedAttention](Edifice.Attention.GatedAttention.md): Gated Attention: learned gating over attention output. - [Edifice.Attention.HGRNv2](Edifice.Attention.HGRNv2.md): HGRN v2: Multi-Resolution Hierarchical Gating with Outer Product State. - [Edifice.Attention.Hawk](Edifice.Attention.Hawk.md): Hawk: Pure RG-LRU Recurrent Model (RecurrentGemma). - [Edifice.Attention.InfiniAttention](Edifice.Attention.InfiniAttention.md): Infini-Attention: local windowed attention + compressive memory. - [Edifice.Attention.KDA](Edifice.Attention.KDA.md): KDA: Kimi Delta Attention. - [Edifice.Attention.LightningAttention](Edifice.Attention.LightningAttention.md): Lightning Attention — hybrid linear/softmax block attention. - [Edifice.Attention.MLA](Edifice.Attention.MLA.md): Multi-Head Latent Attention (MLA) from DeepSeek-V2/V3. - [Edifice.Attention.Mega](Edifice.Attention.Mega.md): Mega: Moving Average Equipped Gated Attention. - [Edifice.Attention.Megalodon](Edifice.Attention.Megalodon.md): MEGALODON: Mega-scale Model with Complex EMA and Timestep Normalization. - [Edifice.Attention.NSA](Edifice.Attention.NSA.md): NSA: Native Sparse Attention (DeepSeek-V3/V4). - [Edifice.Attention.RNoPESWA](Edifice.Attention.RNoPESWA.md): RNoPE-SWA: Sliding Window Attention without positional encoding. - [Edifice.Attention.RetNetV2](Edifice.Attention.RetNetV2.md): RetNet v2: Improved Retentive Network with Enhanced Chunkwise Retention. - [Edifice.Attention.RingAttention](Edifice.Attention.RingAttention.md): Ring Attention: chunked attention simulating ring-distributed computation (Liu et al., 2023). - [Edifice.Attention.TMRoPE](Edifice.Attention.TMRoPE.md): TMRoPE: Time-aligned Multimodal RoPE for unified position encoding across modalities. - [Edifice.Attention.YARN](Edifice.Attention.YARN.md): YaRN: Yet another RoPE extensioN for context window extension. - [Edifice.Audio.EnCodec](Edifice.Audio.EnCodec.md): EnCodec: High-Fidelity Neural Audio Compression. - [Edifice.Audio.SoundStorm](Edifice.Audio.SoundStorm.md): SoundStorm: Efficient Parallel Audio Generation via masked prediction. - [Edifice.Audio.VALLE](Edifice.Audio.VALLE.md): VALL-E: Neural Codec Language Models for Zero-Shot Text-to-Speech. - [Edifice.Blocks.CausalMask](Edifice.Blocks.CausalMask.md): Causal and window attention mask utilities. - [Edifice.Blocks.DepthwiseConv](Edifice.Blocks.DepthwiseConv.md): 1D depthwise separable convolution block for sequence models. - [Edifice.Blocks.KVCache](Edifice.Blocks.KVCache.md): KV Cache: Inference-time Key-Value Caching for Autoregressive Decoding. - [Edifice.Blocks.SSMax](Edifice.Blocks.SSMax.md): Scalable-Softmax (SSMax): sequence-length-aware softmax. - [Edifice.Blocks.Softpick](Edifice.Blocks.Softpick.md): Softpick: non-saturating, naturally sparse normalization. - [Edifice.Contrastive.JEPA](Edifice.Contrastive.JEPA.md): JEPA - Joint Embedding Predictive Architecture. - [Edifice.Contrastive.SigLIP](Edifice.Contrastive.SigLIP.md): SigLIP - Sigmoid Loss for Language-Image Pre-training. - [Edifice.Contrastive.TemporalJEPA](Edifice.Contrastive.TemporalJEPA.md): Temporal JEPA — Joint Embedding Predictive Architecture for sequences. - [Edifice.Export.GGUF](Edifice.Export.GGUF.md): GGUF (GPT-Generated Unified Format) exporter for Edifice models. - [Edifice.Feedforward.BitNet](Edifice.Feedforward.BitNet.md): BitNet: 1-bit/1.58-bit transformer with ternary weight quantization. - [Edifice.Feedforward.KAT](Edifice.Feedforward.KAT.md): KAT: KAN-Attention Transformer — attention blocks with KAN replacing FFN. - [Edifice.Generative.CogVideoX](Edifice.Generative.CogVideoX.md): CogVideoX: Text-to-Video Diffusion with Expert Transformer. - [Edifice.Generative.DiTv2](Edifice.Generative.DiTv2.md): DiT v2: Improved Diffusion Transformer with Unified AdaLN and QK-Norm. - [Edifice.Generative.LinearDiT](Edifice.Generative.LinearDiT.md): Linear DiT / SANA: Diffusion Transformer with Linear Attention. - [Edifice.Generative.MAR](Edifice.Generative.MAR.md): MAR: Masked Autoregressive Generation. - [Edifice.Generative.MMDiT](Edifice.Generative.MMDiT.md): MMDiT: Multimodal Diffusion Transformer. - [Edifice.Generative.SiT](Edifice.Generative.SiT.md): SiT: Scalable Interpolant Transformer. - [Edifice.Generative.SoFlow](Edifice.Generative.SoFlow.md): SoFlow: Solution Flow Models for One-Step Generative Modeling. - [Edifice.Generative.TRELLIS](Edifice.Generative.TRELLIS.md): TRELLIS: Structured 3D Latents for Scalable 3D Generation. - [Edifice.Generative.Transfusion](Edifice.Generative.Transfusion.md): Transfusion: Unified Autoregressive Text + Diffusion Image Generation. - [Edifice.Generative.VAR](Edifice.Generative.VAR.md): VAR: Visual Autoregressive Modeling via Next-Scale Prediction. - [Edifice.Graph.EGNN](Edifice.Graph.EGNN.md): E(n) Equivariant Graph Neural Network. - [Edifice.Graph.GINv2](Edifice.Graph.GINv2.md): GINv2: Graph Isomorphism Network with edge features (Hu et al., 2020). - [Edifice.Inference.Medusa](Edifice.Inference.Medusa.md): Medusa: Multi-Head Speculative Decoding for 2-3x inference speedup. - [Edifice.Interpretability.SparseAutoencoder](Edifice.Interpretability.SparseAutoencoder.md): Sparse Autoencoder (SAE) for mechanistic interpretability. - [Edifice.Interpretability.Transcoder](Edifice.Interpretability.Transcoder.md): Transcoder for cross-layer mechanistic interpretability. - [Edifice.Memory.Engram](Edifice.Memory.Engram.md): Engram: O(1) Hash-Based Associative Memory via Locality-Sensitive Hashing. - [Edifice.Meta.DPO](Edifice.Meta.DPO.md): DPO: Direct Preference Optimization. - [Edifice.Meta.DistillationHead](Edifice.Meta.DistillationHead.md): Distillation Head — projects student hidden states to match teacher representations. - [Edifice.Meta.DoRA](Edifice.Meta.DoRA.md): DoRA: Weight-Decomposed Low-Rank Adaptation. - [Edifice.Meta.GRPO](Edifice.Meta.GRPO.md): GRPO: Group Relative Policy Optimization. - [Edifice.Meta.HybridBuilder](Edifice.Meta.HybridBuilder.md): Configurable Hybrid Builder — flexible hybrid architecture composition. - [Edifice.Meta.KTO](Edifice.Meta.KTO.md): KTO: Kahneman-Tversky Optimization for RLHF from binary feedback. - [Edifice.Meta.MixtureOfAgents](Edifice.Meta.MixtureOfAgents.md): Mixture of Agents: N proposer models feed into an aggregator. - [Edifice.Meta.MixtureOfDepths](Edifice.Meta.MixtureOfDepths.md): Mixture of Depths: per-token routing where only top-C% tokens are processed. - [Edifice.Meta.MixtureOfTokenizers](Edifice.Meta.MixtureOfTokenizers.md): Mixture of Tokenizers — multiple parallel embedding pathways with learned routing. - [Edifice.Meta.MoEv2](Edifice.Meta.MoEv2.md): MoE v2: Expert Choice Routing + Shared Experts + Aux-Loss-Free Load Balancing. - [Edifice.Meta.QAT](Edifice.Meta.QAT.md): Quantization-Aware Training (QAT) — transformer with quantized linear layers. - [Edifice.Meta.RLHFHead](Edifice.Meta.RLHFHead.md): RLHF heads: reward model and DPO preference heads for alignment. - [Edifice.Meta.SpeculativeDecoding](Edifice.Meta.SpeculativeDecoding.md): Speculative Decoding — accelerate autoregressive generation with draft+verify. - [Edifice.Meta.SpeculativeHead](Edifice.Meta.SpeculativeHead.md): Speculative Head — multi-head parallel draft with per-head MLPs (Medusa/EAGLE). - [Edifice.Meta.TestTimeCompute](Edifice.Meta.TestTimeCompute.md): Test-Time Compute — backbone + scoring network for inference-time scaling. - [Edifice.Multimodal.Fusion](Edifice.Multimodal.Fusion.md): Multimodal Fusion Layers for Vision-Language Models. - [Edifice.RL.Environment](Edifice.RL.Environment.md): Behaviour for reinforcement learning environments. - [Edifice.RL.Environments.CartPole](Edifice.RL.Environments.CartPole.md): Classic CartPole balancing environment. - [Edifice.RL.Environments.GridWorld](Edifice.RL.Environments.GridWorld.md): Simple grid world environment for testing RL algorithms. - [Edifice.RL.GAE](Edifice.RL.GAE.md): Generalized Advantage Estimation (GAE). - [Edifice.RL.PPOTrainer](Edifice.RL.PPOTrainer.md): Proximal Policy Optimization (PPO) trainer. - [Edifice.RL.PolicyValue](Edifice.RL.PolicyValue.md): Policy-Value network for reinforcement learning. - [Edifice.Recurrent.GatedDeltaNet](Edifice.Recurrent.GatedDeltaNet.md): Gated DeltaNet - Linear Attention with Gated Delta Rule. - [Edifice.Recurrent.NativeRecurrence](Edifice.Recurrent.NativeRecurrence.md): Native Recurrence — unified module for multiple minimal recurrence types. - [Edifice.Recurrent.SLSTM](Edifice.Recurrent.SLSTM.md): sLSTM: Scalar LSTM with Exponential Gating. - [Edifice.Recurrent.TTTE2E](Edifice.Recurrent.TTTE2E.md): TTT-E2E: End-to-End Test-Time Training for Long Context. - [Edifice.Recurrent.XLSTMv2](Edifice.Recurrent.XLSTMv2.md): xLSTM v2: Improved Extended Long Short-Term Memory. - [Edifice.Robotics.ACT](Edifice.Robotics.ACT.md): ACT: Action Chunking with Transformers for robot imitation learning. - [Edifice.Robotics.OpenVLA](Edifice.Robotics.OpenVLA.md): OpenVLA: Open-Source Vision-Language-Action Model. - [Edifice.SSM.GSS](Edifice.SSM.GSS.md): GSS: Gated State Space Model. - [Edifice.SSM.HyenaV2](Edifice.SSM.HyenaV2.md): Hyena v2: Improved Implicit Long Convolution with Short Conv and Better Decay. - [Edifice.SSM.Hymba](Edifice.SSM.Hymba.md): Hymba: Hybrid-head Architecture with Parallel Mamba + Attention. - [Edifice.SSM.Mamba3](Edifice.SSM.Mamba3.md): Mamba-3: Advanced Selective State Space Model with complex state dynamics. - [Edifice.SSM.SSTransformer](Edifice.SSM.SSTransformer.md): State Space Transformer — parallel SSM + attention with learned gating per block. - [Edifice.SSM.StripedHyena](Edifice.SSM.StripedHyena.md): Striped Hyena: interleaved Hyena long convolution and gated convolution layers. - [Edifice.Scientific.FNO](Edifice.Scientific.FNO.md): FNO: Fourier Neural Operator. - [Edifice.Transformer.ByteLatentTransformer](Edifice.Transformer.ByteLatentTransformer.md): Byte Latent Transformer (BLT) — byte-level processing via encode-process-decode. - [Edifice.Transformer.DecoderOnly](Edifice.Transformer.DecoderOnly.md): GPT-style decoder-only transformer with GQA + RoPE + SwiGLU + RMSNorm. - [Edifice.Transformer.MultiTokenPrediction](Edifice.Transformer.MultiTokenPrediction.md): Multi-Token Prediction (MTP) — predict multiple future tokens simultaneously. - [Edifice.Transformer.NemotronH](Edifice.Transformer.NemotronH.md): Nemotron-H: NVIDIA's Hybrid Mamba-Transformer Architecture. - [Edifice.Utils.Quantization](Edifice.Utils.Quantization.md): Post-Training Quantization Toolkit. - [Edifice.Vision.DINOv2](Edifice.Vision.DINOv2.md): DINOv2: Self-supervised vision backbone via self-distillation. - [Edifice.Vision.EfficientViT](Edifice.Vision.EfficientViT.md): EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction. - [Edifice.Vision.FocalNet](Edifice.Vision.FocalNet.md): FocalNet: Focal Modulation Networks for vision (Yang et al., 2022). - [Edifice.Vision.GaussianSplat](Edifice.Vision.GaussianSplat.md): 3D Gaussian Splatting for real-time radiance field rendering. - [Edifice.Vision.MambaVision](Edifice.Vision.MambaVision.md): MambaVision: A Hybrid Mamba-Transformer Vision Backbone. - [Edifice.Vision.MetaFormer](Edifice.Vision.MetaFormer.md): MetaFormer: The general architecture behind ViT's success. - [Edifice.Vision.NeRF](Edifice.Vision.NeRF.md): NeRF: Neural Radiance Fields network (Mildenhall et al., 2020). - [Edifice.Vision.PoolFormer](Edifice.Vision.PoolFormer.md): PoolFormer: MetaFormer with average pooling as token mixer (Yu et al., 2022). - [Edifice.WorldModel.WorldModel](Edifice.WorldModel.WorldModel.md): World Model — learns a latent dynamics model of an environment. - Feedforward - [Edifice.Feedforward.KAN](Edifice.Feedforward.KAN.md): KAN: Kolmogorov-Arnold Networks with learnable activation functions. - [Edifice.Feedforward.MLP](Edifice.Feedforward.MLP.md): Multi-Layer Perceptron (feedforward neural network). - [Edifice.Feedforward.TabNet](Edifice.Feedforward.TabNet.md): TabNet - Attentive Interpretable Tabular Learning. - Convolutional - [Edifice.Convolutional.Conv](Edifice.Convolutional.Conv.md): Conv1D and Conv2D building blocks for convolutional neural networks. - [Edifice.Convolutional.DenseNet](Edifice.Convolutional.DenseNet.md): DenseNet (Densely Connected Convolutional Network) implementation. - [Edifice.Convolutional.EfficientNet](Edifice.Convolutional.EfficientNet.md): EfficientNet - Compound Scaling of Neural Networks. - [Edifice.Convolutional.MobileNet](Edifice.Convolutional.MobileNet.md): MobileNet - Depthwise Separable Convolutions for Efficient Inference. - [Edifice.Convolutional.ResNet](Edifice.Convolutional.ResNet.md): Residual Network (ResNet) implementation. - [Edifice.Convolutional.TCN](Edifice.Convolutional.TCN.md): Temporal Convolutional Network (TCN) for sequence modeling. - Recurrent - [Edifice.Recurrent](Edifice.Recurrent.md): Recurrent neural network layers for temporal sequence processing. - [Edifice.Recurrent.DeltaNet](Edifice.Recurrent.DeltaNet.md): DeltaNet - Linear Attention with Delta Rule. - [Edifice.Recurrent.MinGRU](Edifice.Recurrent.MinGRU.md): Minimal GRU (MinGRU) - A simplified GRU with a single gate. - [Edifice.Recurrent.MinLSTM](Edifice.Recurrent.MinLSTM.md): Minimal LSTM (MinLSTM) - A simplified LSTM that is parallel-scannable. - [Edifice.Recurrent.Reservoir](Edifice.Recurrent.Reservoir.md): Echo State Networks / Reservoir Computing. - [Edifice.Recurrent.TTT](Edifice.Recurrent.TTT.md): Test-Time Training (TTT) Layers. - [Edifice.Recurrent.Titans](Edifice.Recurrent.Titans.md): Titans - Neural Long-Term Memory with Surprise-Gated Updates. - [Edifice.Recurrent.XLSTM](Edifice.Recurrent.XLSTM.md): xLSTM: Extended Long Short-Term Memory. - State Space Models - [Edifice.SSM.BiMamba](Edifice.SSM.BiMamba.md): BiMamba: Bidirectional Mamba for non-causal sequence modeling. - [Edifice.SSM.GatedSSM](Edifice.SSM.GatedSSM.md): GatedSSM: Simplified gated temporal network inspired by state space models. - [Edifice.SSM.H3](Edifice.SSM.H3.md): H3: Hungry Hungry Hippos. - [Edifice.SSM.Hybrid](Edifice.SSM.Hybrid.md): Configurable Hybrid Backbone+Attention architecture for efficient sequence modeling. - [Edifice.SSM.Hyena](Edifice.SSM.Hyena.md): Hyena: Sub-quadratic attention alternative via long convolutions and gating. - [Edifice.SSM.Mamba](Edifice.SSM.Mamba.md): Mamba: True Selective State Space Model with optimized parallel scan. - [Edifice.SSM.MambaCumsum](Edifice.SSM.MambaCumsum.md): Mamba variant for experimenting with alternative scan algorithms. - [Edifice.SSM.MambaHillisSteele](Edifice.SSM.MambaHillisSteele.md): Mamba variant using Hillis-Steele parallel scan. - [Edifice.SSM.MambaSSD](Edifice.SSM.MambaSSD.md): Mamba variant using State Space Duality (SSD) algorithm from Mamba-2. - [Edifice.SSM.S4](Edifice.SSM.S4.md): S4: Structured State Spaces for Sequences. - [Edifice.SSM.S4D](Edifice.SSM.S4D.md): S4D: S4 with Diagonal State Matrix. - [Edifice.SSM.S5](Edifice.SSM.S5.md): S5: Simplified State Space Sequence model. - [Edifice.SSM.Zamba](Edifice.SSM.Zamba.md): Zamba: Mamba with Single Shared Attention layer. - Attention - [Edifice.Attention.FNet](Edifice.Attention.FNet.md): FNet: Replacing Attention with Fourier Transform. - [Edifice.Attention.GLA](Edifice.Attention.GLA.md): GLA: Gated Linear Attention with data-dependent gating. - [Edifice.Attention.GQA](Edifice.Attention.GQA.md): GQA: Grouped Query Attention. - [Edifice.Attention.Griffin](Edifice.Attention.Griffin.md): Griffin: Hybrid RG-LRU + Local Attention Architecture. - [Edifice.Attention.HGRN](Edifice.Attention.HGRN.md): HGRN-2: Hierarchically Gated Linear RNN with State Expansion. - [Edifice.Attention.LinearTransformer](Edifice.Attention.LinearTransformer.md): Linear Transformer: Linear attention using kernel feature maps. - [Edifice.Attention.MultiHead](Edifice.Attention.MultiHead.md): Temporal attention mechanisms for sequence processing. - [Edifice.Attention.Nystromformer](Edifice.Attention.Nystromformer.md): Nystromformer: Nystrom-based approximation for O(N) attention. - [Edifice.Attention.Perceiver](Edifice.Attention.Perceiver.md): Perceiver IO: General-purpose architecture with learned latent array. - [Edifice.Attention.Performer](Edifice.Attention.Performer.md): Performer: Fast Attention Via Positive Orthogonal Random Features (FAVOR+). - [Edifice.Attention.RWKV](Edifice.Attention.RWKV.md): RWKV-7 "Goose": Linear attention with O(1) space complexity. - [Edifice.Attention.RetNet](Edifice.Attention.RetNet.md): RetNet: Retentive Network - A Successor to Transformer. - Vision - [Edifice.Vision.ConvNeXt](Edifice.Vision.ConvNeXt.md): ConvNeXt - A Modernized ResNet implementation. - [Edifice.Vision.DeiT](Edifice.Vision.DeiT.md): Data-efficient Image Transformer (DeiT) implementation. - [Edifice.Vision.MLPMixer](Edifice.Vision.MLPMixer.md): MLP-Mixer - All-MLP architecture for vision. - [Edifice.Vision.SwinTransformer](Edifice.Vision.SwinTransformer.md): Swin Transformer (Shifted Window Transformer) implementation. - [Edifice.Vision.UNet](Edifice.Vision.UNet.md): U-Net encoder-decoder architecture with skip connections. - [Edifice.Vision.ViT](Edifice.Vision.ViT.md): Vision Transformer (ViT) implementation. - Generative - [Edifice.Generative.ConsistencyModel](Edifice.Generative.ConsistencyModel.md): Consistency Model: Single-step generation via consistency function. - [Edifice.Generative.DDIM](Edifice.Generative.DDIM.md): DDIM: Denoising Diffusion Implicit Models. - [Edifice.Generative.DiT](Edifice.Generative.DiT.md): DiT: Diffusion Transformer. - [Edifice.Generative.Diffusion](Edifice.Generative.Diffusion.md): Diffusion Policy: Action generation via denoising diffusion. - [Edifice.Generative.FlowMatching](Edifice.Generative.FlowMatching.md): Flow Matching: Action generation via continuous normalizing flows. - [Edifice.Generative.GAN](Edifice.Generative.GAN.md): Generative Adversarial Network framework. - [Edifice.Generative.LatentDiffusion](Edifice.Generative.LatentDiffusion.md): Latent Diffusion: Diffusion in VAE latent space. - [Edifice.Generative.NormalizingFlow](Edifice.Generative.NormalizingFlow.md): Normalizing Flows with RealNVP-style affine coupling layers. - [Edifice.Generative.ScoreSDE](Edifice.Generative.ScoreSDE.md): Score-based SDE: Unified score matching framework for generative modeling. - [Edifice.Generative.VAE](Edifice.Generative.VAE.md): Variational Autoencoder (VAE). - [Edifice.Generative.VQVAE](Edifice.Generative.VQVAE.md): Vector-Quantized Variational Autoencoder (VQ-VAE). - Contrastive - [Edifice.Contrastive.BYOL](Edifice.Contrastive.BYOL.md): BYOL - Bootstrap Your Own Latent. - [Edifice.Contrastive.BarlowTwins](Edifice.Contrastive.BarlowTwins.md): Barlow Twins - Redundancy Reduction for Self-Supervised Learning. - [Edifice.Contrastive.MAE](Edifice.Contrastive.MAE.md): MAE - Masked Autoencoder. - [Edifice.Contrastive.SimCLR](Edifice.Contrastive.SimCLR.md): SimCLR - Simple Contrastive Learning of Representations. - [Edifice.Contrastive.VICReg](Edifice.Contrastive.VICReg.md): VICReg - Variance-Invariance-Covariance Regularization. - Graph - [Edifice.Graph.GAT](Edifice.Graph.GAT.md): Graph Attention Network (Velickovic et al., 2018). - [Edifice.Graph.GCN](Edifice.Graph.GCN.md): Graph Convolutional Network (Kipf & Welling, 2017). - [Edifice.Graph.GIN](Edifice.Graph.GIN.md): Graph Isomorphism Network (Xu et al., 2019). - [Edifice.Graph.GraphSAGE](Edifice.Graph.GraphSAGE.md): GraphSAGE - Inductive Representation Learning on Large Graphs. - [Edifice.Graph.GraphTransformer](Edifice.Graph.GraphTransformer.md): Graph Transformer with structural encoding. - [Edifice.Graph.PNA](Edifice.Graph.PNA.md): Principal Neighbourhood Aggregation (Corso et al., 2020). - [Edifice.Graph.SchNet](Edifice.Graph.SchNet.md): SchNet - Continuous-Filter Convolutional Neural Network. - Sets - [Edifice.Sets.DeepSets](Edifice.Sets.DeepSets.md): Permutation-invariant set processing (Zaheer et al., 2017). - [Edifice.Sets.PointNet](Edifice.Sets.PointNet.md): Point cloud processing network (Qi et al., 2017). - Energy - [Edifice.Energy.EBM](Edifice.Energy.EBM.md): Energy-Based Model (EBM). - [Edifice.Energy.Hopfield](Edifice.Energy.Hopfield.md): Modern Continuous Hopfield Network (Ramsauer et al., 2020). - [Edifice.Energy.NeuralODE](Edifice.Energy.NeuralODE.md): Neural ODE - Continuous-Depth Residual Networks. - Probabilistic - [Edifice.Probabilistic.Bayesian](Edifice.Probabilistic.Bayesian.md): Bayesian Neural Network layers with weight uncertainty. - [Edifice.Probabilistic.EvidentialNN](Edifice.Probabilistic.EvidentialNN.md): Evidential Deep Learning with Dirichlet Priors. - [Edifice.Probabilistic.MCDropout](Edifice.Probabilistic.MCDropout.md): MC Dropout for uncertainty estimation (Gal & Ghahramani, 2016). - Memory - [Edifice.Memory.MemoryNetwork](Edifice.Memory.MemoryNetwork.md): End-to-End Memory Networks (Sukhbaatar et al., 2015). - [Edifice.Memory.NTM](Edifice.Memory.NTM.md): Neural Turing Machine (Graves et al., 2014). - Meta - [Edifice.Meta.Adapter](Edifice.Meta.Adapter.md): Bottleneck Adapter modules for parameter-efficient finetuning. - [Edifice.Meta.Capsule](Edifice.Meta.Capsule.md): Capsule Networks with dynamic routing (Sabour et al., 2017). - [Edifice.Meta.Hypernetwork](Edifice.Meta.Hypernetwork.md): Hypernetworks that generate weights for a target network. - [Edifice.Meta.LoRA](Edifice.Meta.LoRA.md): Low-Rank Adaptation (LoRA) for parameter-efficient finetuning. - [Edifice.Meta.MoE](Edifice.Meta.MoE.md): Mixture of Experts (MoE) for adaptive expert selection. - [Edifice.Meta.SoftMoE](Edifice.Meta.SoftMoE.md): Soft Mixture of Experts (Puigcerver et al., 2024). - [Edifice.Meta.SwitchMoE](Edifice.Meta.SwitchMoE.md): Switch Transformer - Top-1 Expert Routing. - Liquid - [Edifice.Liquid](Edifice.Liquid.md): Liquid Neural Networks (LNN) - Continuous-time adaptive neural networks. - Neuromorphic - [Edifice.Neuromorphic.ANN2SNN](Edifice.Neuromorphic.ANN2SNN.md): ANN-to-SNN Conversion via Rate Coding. - [Edifice.Neuromorphic.SNN](Edifice.Neuromorphic.SNN.md): Spiking Neural Network with surrogate gradients. - Building Blocks - [Edifice.Blocks.ALiBi](Edifice.Blocks.ALiBi.md): Attention with Linear Biases (ALiBi). - [Edifice.Blocks.AdaptiveNorm](Edifice.Blocks.AdaptiveNorm.md): Adaptive Layer Normalization (AdaLN / AdaLN-Zero). - [Edifice.Blocks.CrossAttention](Edifice.Blocks.CrossAttention.md): Cross-Attention Layer. - [Edifice.Blocks.FFN](Edifice.Blocks.FFN.md): Feed-Forward Network building blocks for transformer architectures. - [Edifice.Blocks.ModelBuilder](Edifice.Blocks.ModelBuilder.md): High-level model building utilities for sequence and vision architectures. - [Edifice.Blocks.PatchEmbed](Edifice.Blocks.PatchEmbed.md): Patch Embedding for Vision Transformers. - [Edifice.Blocks.RMSNorm](Edifice.Blocks.RMSNorm.md): Root Mean Square Layer Normalization. - [Edifice.Blocks.RoPE](Edifice.Blocks.RoPE.md): Rotary Position Embedding (RoPE). - [Edifice.Blocks.SinusoidalPE](Edifice.Blocks.SinusoidalPE.md): Sinusoidal Positional Encoding. - [Edifice.Blocks.SwiGLU](Edifice.Blocks.SwiGLU.md): SwiGLU / GeGLU / ReGLU gated feed-forward networks. - [Edifice.Blocks.TransformerBlock](Edifice.Blocks.TransformerBlock.md): Composable transformer block with configurable attention/mixing function. - Internals - [Edifice.Graph.MessagePassing](Edifice.Graph.MessagePassing.md): Generic Message Passing Neural Network (MPNN) framework. - [Edifice.SSM.Common](Edifice.SSM.Common.md): Shared components for all Mamba architecture variants. - [Edifice.SSM.HybridBuilder](Edifice.SSM.HybridBuilder.md): Flexible hybrid architecture builder for combining different layer types. - [Edifice.Utils.Common](Edifice.Utils.Common.md): Common utility functions shared across architecture implementations. - [Edifice.Utils.FusedOps](Edifice.Utils.FusedOps.md): Fused operations for GPU kernel optimization. - [Edifice.Utils.ODESolver](Edifice.Utils.ODESolver.md): ODE Solver for Nx tensors - used by Liquid Neural Networks. ## Mix Tasks - [mix edifice.export_gguf](Mix.Tasks.Edifice.ExportGguf.md): Export an Edifice model checkpoint to GGUF format.