# `Edifice.Recurrent.Titans`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/recurrent/titans.ex#L1)

Titans - Neural Long-Term Memory with Surprise-Gated Updates.

Implements the Titans architecture from "Titans: Learning to Memorize
at Test Time" (Behrouz et al., 2025). Titans extend TTT-style test-time
learning with a surprise-based gating mechanism: the memory is updated
more aggressively when the model encounters surprising (high-error) inputs.

## Key Innovations

- **Surprise-gated memory**: Memory update magnitude scales with prediction error
- **Long-term memory module**: Persistent memory that adapts to data distribution
- **Momentum-based updates**: Uses gradient momentum for smoother memory evolution
- **Covariance-aware**: Optional second-order information for better updates

## Equations

```
# Project inputs
q_t = W_q x_t                          # Query
k_t = W_k x_t                          # Key
v_t = W_v x_t                          # Value

# Memory read: retrieve current prediction
pred_t = M_{t-1} @ k_t

# Surprise = ||pred_t - v_t||^2 (prediction error)
surprise_t = ||pred_t - v_t||^2

# Surprise gate: higher surprise -> larger update
gate_t = sigmoid(W_g * [x_t, surprise_t])

# Memory update with surprise gating
grad_t = (pred_t - v_t) @ k_t^T
momentum_t = alpha * momentum_{t-1} + grad_t
M_t = M_{t-1} - gate_t * eta * momentum_t

# Output from updated memory
o_t = M_t @ q_t
```

## Architecture

```
Input [batch, seq_len, embed_dim]
      |
      v
[Input Projection] -> hidden_size
      |
      v
+----------------------------------+
|      Titans Layer                |
|  Project to Q, K, V              |
|  For each timestep:              |
|    pred = M @ k                  |
|    surprise = ||pred - v||^2     |
|    gate = f(x, surprise)         |
|    M -= gate * eta * grad        |
|    output = M @ q                |
+----------------------------------+
      | (repeat num_layers)
      v
[Layer Norm] -> [Last Timestep]
      |
      v
Output [batch, hidden_size]
```

## Usage

    model = Titans.build(
      embed_dim: 287,
      hidden_size: 256,
      memory_size: 64,
      num_layers: 4,
      dropout: 0.1
    )

## References
- Paper: https://arxiv.org/abs/2501.00663

# `build_opt`

```elixir
@type build_opt() ::
  {:dropout, float()}
  | {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:memory_size, pos_integer()}
  | {:momentum, float()}
  | {:num_layers, pos_integer()}
  | {:seq_len, pos_integer()}
  | {:window_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Titans model for sequence processing.

## Options
  - `:embed_dim` - Size of input embedding per frame (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:memory_size` - Memory key/value dimension (default: 64)
  - `:num_layers` - Number of Titans layers (default: 4)
  - `:dropout` - Dropout rate between layers (default: 0.1)
  - `:momentum` - Momentum coefficient for memory updates (default: 0.9)
  - `:window_size` - Expected sequence length (default: 60)

## Returns
  An Axon model that processes sequences and outputs the last hidden state.

# `default_dropout`

```elixir
@spec default_dropout() :: float()
```

Default dropout rate

# `default_hidden_size`

```elixir
@spec default_hidden_size() :: pos_integer()
```

Default hidden dimension

# `default_memory_size`

```elixir
@spec default_memory_size() :: pos_integer()
```

Default memory key/value dimension

# `default_momentum`

```elixir
@spec default_momentum() :: float()
```

Default momentum coefficient

# `default_num_layers`

```elixir
@spec default_num_layers() :: pos_integer()
```

Default number of layers

# `output_size`

```elixir
@spec output_size(keyword()) :: non_neg_integer()
```

Get the output size of a Titans model.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
