# `Edifice.Attention.RNoPESWA`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/attention/rnope_swa.ex#L1)

RNoPE-SWA: Sliding Window Attention without positional encoding.

A minimalist attention mechanism that combines:
- **Sliding Window Attention**: Each position only attends to the last `window_size` positions
- **No Positional Encoding**: Pure content-based attention without position bias

## Key Innovation

By removing positional encoding, the model learns purely content-based attention patterns.
Combined with sliding window, this creates an efficient local attention mechanism that:
- Has O(L * W) complexity instead of O(L^2) where W = window_size
- Generalizes perfectly to any sequence length at inference time
- Forces the model to rely on content similarity, not position heuristics

## Architecture

```
Input [batch, seq_len, embed_dim]
      |
      v (no positional encoding)
+--------------------------------+
|  Sliding Window Attention      |
|                                |
|  Each position attends to      |
|  last W positions only         |
|  Q, K, V projections           |
|  Attention(Q, K, V)            |
|  Output projection             |
+--------------------------------+
      |
[batch, seq_len, hidden_size]
```

## When to Use

- Long sequences where full attention is too expensive
- Tasks where local context is most important (e.g., language modeling)
- When you want length generalization at inference time
- When you want to ablate the effect of positional encoding

## Usage

    model = RNoPESWA.build(
      embed_dim: 256,
      hidden_size: 256,
      num_heads: 4,
      window_size: 128,
      num_layers: 6
    )

## Reference

- "RoPE is Overrated: Positional Encoding Ablations" (2025)
- "Longformer: The Long-Document Transformer" (Beltagy et al., 2020)

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:window_size, pos_integer()}
  | {:dropout, float()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build an RNoPE-SWA model.

## Options

  - `:embed_dim` - Size of input embedding per timestep (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:num_heads` - Number of attention heads (default: 4)
  - `:num_layers` - Number of transformer blocks (default: 6)
  - `:window_size` - Attention window size (default: 128)
  - `:dropout` - Dropout rate (default: 0.1)

## Returns

  An Axon model that outputs `[batch, hidden_size]` from the last position.

# `build_sliding_window_attention`

```elixir
@spec build_sliding_window_attention(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build a sliding window attention layer without positional encoding.

## Options

  - `:hidden_size` - Hidden dimension (default: 256)
  - `:num_heads` - Number of attention heads (default: 4)
  - `:window_size` - Attention window size (default: 128)
  - `:rope` - Whether to use RoPE (default: false for RNoPE-SWA)
  - `:name` - Layer name prefix

# `output_size`

```elixir
@spec output_size(keyword()) :: non_neg_integer()
```

Get the output dimension for a model configuration.

# `recommended_defaults`

```elixir
@spec recommended_defaults() :: keyword()
```

Recommended default configuration.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
