Titans - Neural Long-Term Memory with Surprise-Gated Updates.
Implements the Titans architecture from "Titans: Learning to Memorize at Test Time" (Behrouz et al., 2025). Titans extend TTT-style test-time learning with a surprise-based gating mechanism: the memory is updated more aggressively when the model encounters surprising (high-error) inputs.
Key Innovations
- Surprise-gated memory: Memory update magnitude scales with prediction error
- Long-term memory module: Persistent memory that adapts to data distribution
- Momentum-based updates: Uses gradient momentum for smoother memory evolution
- Covariance-aware: Optional second-order information for better updates
Equations
# Project inputs
q_t = W_q x_t # Query
k_t = W_k x_t # Key
v_t = W_v x_t # Value
# Memory read: retrieve current prediction
pred_t = M_{t-1} @ k_t
# Surprise = ||pred_t - v_t||^2 (prediction error)
surprise_t = ||pred_t - v_t||^2
# Surprise gate: higher surprise -> larger update
gate_t = sigmoid(W_g * [x_t, surprise_t])
# Memory update with surprise gating
grad_t = (pred_t - v_t) @ k_t^T
momentum_t = alpha * momentum_{t-1} + grad_t
M_t = M_{t-1} - gate_t * eta * momentum_t
# Output from updated memory
o_t = M_t @ q_tArchitecture
Input [batch, seq_len, embed_dim]
|
v
[Input Projection] -> hidden_size
|
v
+----------------------------------+
| Titans Layer |
| Project to Q, K, V |
| For each timestep: |
| pred = M @ k |
| surprise = ||pred - v||^2 |
| gate = f(x, surprise) |
| M -= gate * eta * grad |
| output = M @ q |
+----------------------------------+
| (repeat num_layers)
v
[Layer Norm] -> [Last Timestep]
|
v
Output [batch, hidden_size]Usage
model = Titans.build(
embed_dim: 287,
hidden_size: 256,
memory_size: 64,
num_layers: 4,
dropout: 0.1
)References
Summary
Functions
Build a Titans model for sequence processing.
Default dropout rate
Default hidden dimension
Default memory key/value dimension
Default momentum coefficient
Default number of layers
Get the output size of a Titans model.
Types
@type build_opt() :: {:dropout, float()} | {:embed_dim, pos_integer()} | {:hidden_size, pos_integer()} | {:memory_size, pos_integer()} | {:momentum, float()} | {:num_layers, pos_integer()} | {:seq_len, pos_integer()} | {:window_size, pos_integer()}
Options for build/1.
Functions
Build a Titans model for sequence processing.
Options
:embed_dim- Size of input embedding per frame (required):hidden_size- Internal hidden dimension (default: 256):memory_size- Memory key/value dimension (default: 64):num_layers- Number of Titans layers (default: 4):dropout- Dropout rate between layers (default: 0.1):momentum- Momentum coefficient for memory updates (default: 0.9):window_size- Expected sequence length (default: 60)
Returns
An Axon model that processes sequences and outputs the last hidden state.
@spec default_dropout() :: float()
Default dropout rate
@spec default_memory_size() :: pos_integer()
Default memory key/value dimension
@spec default_momentum() :: float()
Default momentum coefficient
@spec default_num_layers() :: pos_integer()
Default number of layers
@spec output_size(keyword()) :: non_neg_integer()
Get the output size of a Titans model.