# `Edifice.Blocks.SwiGLU`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/blocks/swiglu.ex#L1)

SwiGLU / GeGLU / ReGLU gated feed-forward networks.

Gated Linear Units with various activation functions, as used in the
feed-forward blocks of modern transformers (LLaMA, PaLM, Mistral).
The gating mechanism provides better gradient flow and expressiveness
compared to standard dense + activation.

## Formula

    SwiGLU(x) = (xW1 * SiLU(xV)) W2
    GeGLU(x)  = (xW1 * GELU(xV)) W2
    ReGLU(x)  = (xW1 * ReLU(xV)) W2

## Architecture

```
Input [batch, ..., dim]
      |
      +-------+-------+
      |               |
   Dense W1       Dense V (gate)
      |               |
      |          Activation (SiLU/GELU/ReLU)
      |               |
      +----> Multiply <+
                |
             Dense W2
                |
Output [batch, ..., dim]
```

## Usage

    ffn = SwiGLU.layer(input, hidden_size: 256, inner_size: 1024)

## References
- "GLU Variants Improve Transformer" (Shazeer, 2020)
- https://arxiv.org/abs/2002.05202

# `layer`

```elixir
@spec layer(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build a SwiGLU feed-forward block as an Axon layer.

## Options
  - `:hidden_size` - Input/output dimension (required)
  - `:inner_size` - Intermediate dimension (default: hidden_size * 2.667, rounded to multiple of 8)
  - `:activation` - Gate activation: :silu, :gelu, :relu (default: :silu)
  - `:dropout` - Dropout rate (default: 0.0)
  - `:name` - Layer name prefix (default: "swiglu")

---

*Consult [api-reference.md](api-reference.md) for complete listing*
