Edifice.Recurrent.Titans (Edifice v0.2.0)

Titans - Neural Long-Term Memory with Surprise-Gated Updates.

Implements the Titans architecture from "Titans: Learning to Memorize at Test Time" (Behrouz et al., 2025). Titans extend TTT-style test-time learning with a surprise-based gating mechanism: the memory is updated more aggressively when the model encounters surprising (high-error) inputs.

Key Innovations

Surprise-gated memory: Memory update magnitude scales with prediction error
Long-term memory module: Persistent memory that adapts to data distribution
Momentum-based updates: Uses gradient momentum for smoother memory evolution
Covariance-aware: Optional second-order information for better updates

Equations

# Project inputs
q_t = W_q x_t                          # Query
k_t = W_k x_t                          # Key
v_t = W_v x_t                          # Value

# Memory read: retrieve current prediction
pred_t = M_{t-1} @ k_t

# Surprise = ||pred_t - v_t||^2 (prediction error)
surprise_t = ||pred_t - v_t||^2

# Surprise gate: higher surprise -> larger update
gate_t = sigmoid(W_g * [x_t, surprise_t])

# Memory update with surprise gating
grad_t = (pred_t - v_t) @ k_t^T
momentum_t = alpha * momentum_{t-1} + grad_t
M_t = M_{t-1} - gate_t * eta * momentum_t

# Output from updated memory
o_t = M_t @ q_t

Architecture

Input [batch, seq_len, embed_dim]
      |
      v
[Input Projection] -> hidden_size
      |
      v
+----------------------------------+
|      Titans Layer                |
|  Project to Q, K, V              |
|  For each timestep:              |
|    pred = M @ k                  |
|    surprise = ||pred - v||^2     |
|    gate = f(x, surprise)         |
|    M -= gate * eta * grad        |
|    output = M @ q                |
+----------------------------------+
      | (repeat num_layers)
      v
[Layer Norm] -> [Last Timestep]
      |
      v
Output [batch, hidden_size]

Usage

model = Titans.build(
  embed_dim: 287,
  hidden_size: 256,
  memory_size: 64,
  num_layers: 4,
  dropout: 0.1
)

References

Paper: https://arxiv.org/abs/2501.00663

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a Titans model for sequence processing.

default_dropout()

Default dropout rate

default_hidden_size()

Default hidden dimension

default_memory_size()

Default memory key/value dimension

default_momentum()

Default momentum coefficient

default_num_layers()

Default number of layers

output_size(opts \\ [])

Get the output size of a Titans model.