Edifice.Feedforward.KAT (Edifice v0.2.0)

KAT: KAN-Attention Transformer — attention blocks with KAN replacing FFN.

Combines standard multi-head self-attention with Kolmogorov-Arnold Network (KAN) layers as the feed-forward sublayer, replacing the typical MLP FFN. KAN layers use learnable activation functions on edges (basis functions like B-splines, sine, Chebyshev) instead of fixed activations on nodes.

Architecture

Input [batch, seq, embed_dim]
      |
      v
+-------------------------------------+
|  TransformerBlock (per layer):       |
|    norm -> MultiHead Attention       |
|    norm -> KAN Layer (replaces FFN)  |
+-------------------------------------+
      | (repeat num_layers)
      v
Final Norm -> Last Timestep
Output [batch, hidden_size]

Why KAN Instead of FFN?

Aspect	Standard FFN	KAN FFN
Activation	Fixed (ReLU/GELU) on nodes	Learnable on edges
Expressiveness	Requires width for accuracy	Learns optimal activation
Interpretability	Low	Higher (visualizable)
Parameters	O(n^2)	O(n^2 * grid_size)

Usage

model = KAT.build(
  embed_dim: 287,
  hidden_size: 256,
  num_heads: 4,
  grid_size: 8,
  basis: :bspline,
  num_layers: 4
)

References

Liu et al., "KAN: Kolmogorov-Arnold Networks" (2024)
Vaswani et al., "Attention Is All You Need" (2017)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a KAT (KAN-Attention Transformer) model.

output_size(opts \\ [])

Get the output size of a KAT model.

recommended_defaults()

Get recommended defaults for KAT.