Edifice.Attention.RNoPESWA (Edifice v0.2.0)

RNoPE-SWA: Sliding Window Attention without positional encoding.

A minimalist attention mechanism that combines:

Sliding Window Attention: Each position only attends to the last window_size positions
No Positional Encoding: Pure content-based attention without position bias

Key Innovation

By removing positional encoding, the model learns purely content-based attention patterns. Combined with sliding window, this creates an efficient local attention mechanism that:

Has O(L * W) complexity instead of O(L^2) where W = window_size
Generalizes perfectly to any sequence length at inference time
Forces the model to rely on content similarity, not position heuristics

Architecture

Input [batch, seq_len, embed_dim]
      |
      v (no positional encoding)
+--------------------------------+
|  Sliding Window Attention      |
|                                |
|  Each position attends to      |
|  last W positions only         |
|  Q, K, V projections           |
|  Attention(Q, K, V)            |
|  Output projection             |
+--------------------------------+
      |
[batch, seq_len, hidden_size]

When to Use

Long sequences where full attention is too expensive
Tasks where local context is most important (e.g., language modeling)
When you want length generalization at inference time
When you want to ablate the effect of positional encoding

Usage

model = RNoPESWA.build(
  embed_dim: 256,
  hidden_size: 256,
  num_heads: 4,
  window_size: 128,
  num_layers: 6
)

Reference

"RoPE is Overrated: Positional Encoding Ablations" (2025)
"Longformer: The Long-Document Transformer" (Beltagy et al., 2020)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build an RNoPE-SWA model.

build_sliding_window_attention(input, opts)

Build a sliding window attention layer without positional encoding.

output_size(opts \\ [])

Get the output dimension for a model configuration.

recommended_defaults()

Recommended default configuration.