# `Edifice.Attention.LightningAttention`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/attention/lightning_attention.ex#L1)

Lightning Attention — hybrid linear/softmax block attention.

Splits the sequence into fixed-size blocks and uses two complementary
attention mechanisms:

- **Intra-block:** Standard softmax attention within each block (O(B²) per block)
- **Inter-block:** Linear attention via cumulative KV state across blocks (O(B·d) per block)

This achieves near-linear overall complexity while retaining the expressivity
of softmax attention at the local level.

## Architecture

```
Input [batch, seq_len, embed_dim]
      |
Input Projection to hidden_size
      |
+--------------------------------------------+
|  Lightning Attention Block (x num_layers)  |
|                                            |
|  LayerNorm -> Q,K,V projections           |
|  Reshape to [batch, heads, blocks, B, d]  |
|                                            |
|  Intra-block: softmax(Q_b @ K_b^T) @ V_b |
|  Inter-block: Q_b @ cumsum(K_j^T V_j)    |
|  Output = intra + inter                   |
|                                            |
|  -> Residual                              |
|  LayerNorm -> FFN -> Residual             |
+--------------------------------------------+
      |
Final LayerNorm
      |
Last timestep -> [batch, hidden_size]
```

## Constraints

`seq_len` must be divisible by `block_size`.

## Usage

    model = LightningAttention.build(
      embed_dim: 287,
      hidden_size: 256,
      num_heads: 8,
      num_layers: 4,
      block_size: 64
    )

## References

- Qin et al., "Lightning Attention-2: A Free Lunch for Handling Unlimited
  Sequence Lengths in Large Language Models" (2024)

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:block_size, pos_integer()}
  | {:dropout, float()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Lightning Attention model.

## Options

  - `:embed_dim` - Input embedding dimension (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:num_heads` - Number of attention heads (default: 8)
  - `:num_layers` - Number of Lightning Attention blocks (default: 4)
  - `:block_size` - Block size B for chunked attention (default: 64).
    `seq_len` must be divisible by this value.
  - `:dropout` - Dropout rate (default: 0.1)
  - `:seq_len` / `:window_size` - Expected sequence length (default: 60)

## Returns

  An Axon model outputting `[batch, hidden_size]`.

# `build_lightning_attention`

```elixir
@spec build_lightning_attention(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build the lightning attention sublayer.

This creates the core attention mechanism with both intra-block (softmax)
and inter-block (linear) attention pathways.

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of the model.

---

*Consult [api-reference.md](api-reference.md) for complete listing*