Feed-Forward Network building blocks for transformer architectures.
Provides standard and gated FFN variants used in the feed-forward sublayer
of transformer blocks. This module unifies the duplicated build_ffn/3
pattern found across attention architectures.
Variants
- Standard:
dense(hidden * expansion) -> activation -> dropout -> dense(hidden) - Gated: Delegates to
SwiGLU.layer/2for gated linear unit variants
Usage
# Standard FFN (default in most transformers)
ffn = FFN.layer(input, hidden_size: 256)
# With custom expansion factor and activation
ffn = FFN.layer(input, hidden_size: 256, expansion_factor: 8, activation: :relu)
# Gated variant (SwiGLU/GeGLU/ReGLU)
ffn = FFN.gated_layer(input, hidden_size: 256, activation: :silu)References
- "Attention Is All You Need" (Vaswani et al., 2017) - original FFN
- "GLU Variants Improve Transformer" (Shazeer, 2020) - gated variants
Summary
Functions
Build a gated feed-forward network (SwiGLU/GeGLU/ReGLU).
Build a standard feed-forward network as an Axon layer.
Functions
Build a gated feed-forward network (SwiGLU/GeGLU/ReGLU).
Delegates to Edifice.Blocks.SwiGLU.layer/2 with a unified API.
Options
:hidden_size- Input/output dimension (required):inner_size- Intermediate dimension (default: hidden_size * 2.667):activation- Gate activation: :silu, :gelu, :relu (default: :silu):dropout- Dropout rate (default: 0.0):name- Layer name prefix (default: "gated_ffn")
Build a standard feed-forward network as an Axon layer.
Options
:hidden_size- Input/output dimension (required):expansion_factor- Inner dimension multiplier (default: 4):inner_size- Explicit inner dimension (overrides expansion_factor):activation- Activation function (default: :gelu):dropout- Dropout rate (default: 0.0):name- Layer name prefix (default: "ffn")