Edifice.SSM.Hyena (Edifice v0.2.0)

Copy Markdown View Source

Hyena: Sub-quadratic attention alternative via long convolutions and gating.

Implements the Hyena Hierarchy from "Hyena Hierarchy: Towards Larger Convolutional Language Models" (Poli et al., ICML 2023). Hyena replaces attention with a hierarchy of long convolutions and element-wise gating, achieving sub-quadratic complexity in sequence length.

Key Innovation: Implicit Long Convolution + Gating

Instead of attention's O(L^2) pairwise interactions, Hyena uses:

  1. A learned implicit filter (small MLP) that generates long convolution kernels
  2. Element-wise gating for non-linearity
  3. Multiple "orders" of this operation for expressivity
Order 2 Hyena:
  v, x1, x2 = linear_projections(input)  # 3 projections
  y = v
  y = long_conv(y, filter_1) * x1        # First order
  y = long_conv(y, filter_2) * x2        # Second order
  output = linear(y)

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
+-----------------------+
| Input Projection      |
+-----------------------+
      |
      v
+-----------------------+
| Hyena Block x N       |
|  ShortConv(input)     |
|  Split: v, x1, x2    |
|  y = v                |
|  y = LongConv(y)*x1   |  <- Implicit filter via MLP
|  y = LongConv(y)*x2   |
|  OutProj + Residual   |
|  FFN                  |
+-----------------------+
      |
      v
[batch, hidden_size]    (last timestep)

Complexity

OperationAttentionHyena
TrainingO(L^2)O(L log L) via FFT
InferenceO(L^2)O(L) with recurrence

Usage

model = Hyena.build(
  embed_dim: 287,
  hidden_size: 256,
  order: 2,
  filter_size: 64,
  num_layers: 4
)

Reference

Summary

Types

Options for build/1.

Functions

Build a Hyena model for sequence processing.

Build a single Hyena block with implicit long convolution and gating.

Get the output size of a Hyena model.

Calculate approximate parameter count for a Hyena model.

Get recommended defaults.

Types

build_opt()

@type build_opt() ::
  {:dropout, float()}
  | {:embed_dim, pos_integer()}
  | {:filter_size, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:order, pos_integer()}
  | {:seq_len, pos_integer()}
  | {:window_size, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a Hyena model for sequence processing.

Options

  • :embed_dim - Size of input embedding per frame (required)
  • :hidden_size - Internal hidden dimension (default: 256)
  • :order - Number of gating levels (default: 2)
  • :filter_size - Implicit filter MLP hidden size (default: 64)
  • :num_layers - Number of Hyena blocks (default: 4)
  • :dropout - Dropout rate (default: 0.1)
  • :window_size - Expected sequence length (default: 60)

Returns

An Axon model that outputs [batch, hidden_size] from the last position.

build_hyena_block(input, opts)

@spec build_hyena_block(
  Axon.t(),
  keyword()
) :: Axon.t()

Build a single Hyena block with implicit long convolution and gating.

output_size(opts \\ [])

@spec output_size(keyword()) :: non_neg_integer()

Get the output size of a Hyena model.

param_count(opts)

@spec param_count(keyword()) :: non_neg_integer()

Calculate approximate parameter count for a Hyena model.