Conformer: convolution-augmented transformer for audio/speech processing.
The Conformer combines self-attention with convolution to capture both global and local patterns. It uses a Macaron-style architecture with two half-step feed-forward modules sandwiching the attention and convolution modules.
Architecture (Macaron Block)
Input [batch, seq_len, hidden_size]
|
+------------------------------------------------+
| Conformer Block (x num_layers) |
| |
| 1. Half-FFN: norm -> FFN -> scale(0.5) |
| -> residual |
| 2. MHSA: norm -> self_attention -> residual |
| 3. Conv module: |
| norm -> pointwise_up -> GLU |
| -> depthwise_conv -> norm -> act |
| -> pointwise_down -> residual |
| 4. Half-FFN: norm -> FFN -> scale(0.5) |
| -> residual |
| 5. Final LayerNorm |
+------------------------------------------------+
|
Final LayerNorm
|
Last timestep -> [batch, hidden_size]Usage
model = Conformer.build(
embed_dim: 287,
hidden_size: 256,
num_heads: 4,
conv_kernel_size: 31,
num_layers: 4
)References
- "Conformer: Convolution-augmented Transformer for Speech Recognition" (Gulati et al., 2020)
Summary
Functions
Build a Conformer model.
Build a single Conformer block with the Macaron structure.
Get the output size of a Conformer model.
Types
@type build_opt() :: {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:num_heads, pos_integer()} | {:conv_kernel_size, pos_integer()} | {:num_layers, pos_integer()} | {:dropout, float()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build a Conformer model.
Options
:embed_dim- Size of input embedding per timestep (required):hidden_size- Internal hidden dimension (default: 256):num_heads- Number of attention heads (default: 4):conv_kernel_size- Kernel size for depthwise convolution (default: 31):num_layers- Number of Conformer blocks (default: 4):dropout- Dropout rate (default: 0.1):window_size- Expected sequence length for JIT optimization (default: 60)
Returns
An Axon model that outputs [batch, hidden_size] from the last position.
Build a single Conformer block with the Macaron structure.
@spec output_size(keyword()) :: pos_integer()
Get the output size of a Conformer model.