Edifice.Blocks.CrossAttention (Edifice v0.2.0)

Cross-Attention Layer.

Standard encoder-decoder attention where queries come from one sequence and keys/values come from another. Used in U-Net conditioning, Perceiver, CLIP text-image alignment, and sequence-to-sequence models.

Architecture

Query source [batch, seq_q, dim_q]    KV source [batch, seq_kv, dim_kv]
      |                                      |
   Dense Wq                           Dense Wk, Wv
      |                                  |      |
      Q                                  K      V
      |                                  |      |
      +----- Attention(Q, K, V) ---------+------+
                    |
             Dense Wo (output projection)
                    |
Output [batch, seq_q, hidden_size]

Usage

output = CrossAttention.layer(queries, context,
  hidden_size: 256,
  num_heads: 4,
  name: "cross_attn"
)

References

"Attention Is All You Need" (Vaswani et al., 2017)

Summary

Functions

layer(query_input, kv_input, opts \\ [])

Build a cross-attention Axon layer.

Functions

layer(query_input, kv_input, opts \\ [])

@spec layer(Axon.t(), Axon.t(), keyword()) :: Axon.t()

Build a cross-attention Axon layer.

Parameters

query_input - Query sequence Axon node [batch, seq_q, dim_q]
kv_input - Key-value sequence Axon node [batch, seq_kv, dim_kv]

Options

:hidden_size - Hidden dimension for Q, K, V projections (required)
:num_heads - Number of attention heads (default: 1)
:dropout - Dropout rate (default: 0.0)
:name - Layer name prefix (default: "cross_attn")