Cross-Attention Layer.
Standard encoder-decoder attention where queries come from one sequence and keys/values come from another. Used in U-Net conditioning, Perceiver, CLIP text-image alignment, and sequence-to-sequence models.
Architecture
Query source [batch, seq_q, dim_q] KV source [batch, seq_kv, dim_kv]
| |
Dense Wq Dense Wk, Wv
| | |
Q K V
| | |
+----- Attention(Q, K, V) ---------+------+
|
Dense Wo (output projection)
|
Output [batch, seq_q, hidden_size]Usage
output = CrossAttention.layer(queries, context,
hidden_size: 256,
num_heads: 4,
name: "cross_attn"
)References
- "Attention Is All You Need" (Vaswani et al., 2017)
Summary
Functions
Build a cross-attention Axon layer.
Functions
Build a cross-attention Axon layer.
Parameters
query_input- Query sequence Axon node [batch, seq_q, dim_q]kv_input- Key-value sequence Axon node [batch, seq_kv, dim_kv]
Options
:hidden_size- Hidden dimension for Q, K, V projections (required):num_heads- Number of attention heads (default: 1):dropout- Dropout rate (default: 0.0):name- Layer name prefix (default: "cross_attn")