Hyena: Sub-quadratic attention alternative via long convolutions and gating.
Implements the Hyena Hierarchy from "Hyena Hierarchy: Towards Larger Convolutional Language Models" (Poli et al., ICML 2023). Hyena replaces attention with a hierarchy of long convolutions and element-wise gating, achieving sub-quadratic complexity in sequence length.
Key Innovation: Implicit Long Convolution + Gating
Instead of attention's O(L^2) pairwise interactions, Hyena uses:
- A learned implicit filter (small MLP) that generates long convolution kernels
- Element-wise gating for non-linearity
- Multiple "orders" of this operation for expressivity
Order 2 Hyena:
v, x1, x2 = linear_projections(input) # 3 projections
y = v
y = long_conv(y, filter_1) * x1 # First order
y = long_conv(y, filter_2) * x2 # Second order
output = linear(y)Architecture
Input [batch, seq_len, embed_dim]
|
v
+-----------------------+
| Input Projection |
+-----------------------+
|
v
+-----------------------+
| Hyena Block x N |
| ShortConv(input) |
| Split: v, x1, x2 |
| y = v |
| y = LongConv(y)*x1 | <- Implicit filter via MLP
| y = LongConv(y)*x2 |
| OutProj + Residual |
| FFN |
+-----------------------+
|
v
[batch, hidden_size] (last timestep)Complexity
| Operation | Attention | Hyena |
|---|---|---|
| Training | O(L^2) | O(L log L) via FFT |
| Inference | O(L^2) | O(L) with recurrence |
Usage
model = Hyena.build(
embed_dim: 287,
hidden_size: 256,
order: 2,
filter_size: 64,
num_layers: 4
)Reference
- Paper: "Hyena Hierarchy: Towards Larger Convolutional Language Models"
- arXiv: https://arxiv.org/abs/2302.10866
Summary
Functions
Build a Hyena model for sequence processing.
Build a single Hyena block with implicit long convolution and gating.
Get the output size of a Hyena model.
Calculate approximate parameter count for a Hyena model.
Get recommended defaults.
Types
@type build_opt() :: {:dropout, float()} | {:embed_dim, pos_integer()} | {:filter_size, pos_integer()} | {:hidden_size, pos_integer()} | {:num_layers, pos_integer()} | {:order, pos_integer()} | {:seq_len, pos_integer()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build a Hyena model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension (default: 256):order- Number of gating levels (default: 2):filter_size- Implicit filter MLP hidden size (default: 64):num_layers- Number of Hyena blocks (default: 4):dropout- Dropout rate (default: 0.1):window_size- Expected sequence length (default: 60)
Returns
An Axon model that outputs [batch, hidden_size] from the last position.
Build a single Hyena block with implicit long convolution and gating.
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of a Hyena model.
@spec param_count(keyword()) :: non_neg_integer()
Calculate approximate parameter count for a Hyena model.
@spec recommended_defaults() :: keyword()
Get recommended defaults.