# `Edifice.Attention.Conformer`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/attention/conformer.ex#L1)

Conformer: convolution-augmented transformer for audio/speech processing.

The Conformer combines self-attention with convolution to capture both global
and local patterns. It uses a Macaron-style architecture with two half-step
feed-forward modules sandwiching the attention and convolution modules.

## Architecture (Macaron Block)

```
Input [batch, seq_len, hidden_size]
      |
+------------------------------------------------+
|   Conformer Block (x num_layers)               |
|                                                |
|   1. Half-FFN: norm -> FFN -> scale(0.5)       |
|      -> residual                               |
|   2. MHSA: norm -> self_attention -> residual  |
|   3. Conv module:                              |
|      norm -> pointwise_up -> GLU               |
|      -> depthwise_conv -> norm -> act           |
|      -> pointwise_down -> residual             |
|   4. Half-FFN: norm -> FFN -> scale(0.5)       |
|      -> residual                               |
|   5. Final LayerNorm                           |
+------------------------------------------------+
      |
Final LayerNorm
      |
Last timestep -> [batch, hidden_size]
```

## Usage

    model = Conformer.build(
      embed_dim: 287,
      hidden_size: 256,
      num_heads: 4,
      conv_kernel_size: 31,
      num_layers: 4
    )

## References
- "Conformer: Convolution-augmented Transformer for Speech Recognition"
  (Gulati et al., 2020)

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:conv_kernel_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:dropout, float()}
  | {:window_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Conformer model.

## Options

  - `:embed_dim` - Size of input embedding per timestep (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:num_heads` - Number of attention heads (default: 4)
  - `:conv_kernel_size` - Kernel size for depthwise convolution (default: 31)
  - `:num_layers` - Number of Conformer blocks (default: 4)
  - `:dropout` - Dropout rate (default: 0.1)
  - `:window_size` - Expected sequence length for JIT optimization (default: 60)

## Returns

  An Axon model that outputs `[batch, hidden_size]` from the last position.

# `build_conformer_block`

```elixir
@spec build_conformer_block(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build a single Conformer block with the Macaron structure.

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of a Conformer model.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
