Graph Attention Network (Velickovic et al., 2018).
Implements attention-based message passing where each node attends to its neighbors with learned attention weights. Unlike GCN which uses fixed normalization, GAT learns to weight neighbor contributions adaptively.
Architecture
Node Features [batch, num_nodes, input_dim]
Adjacency [batch, num_nodes, num_nodes]
|
v
+--------------------------------------+
| GAT Layer (K heads): |
| |
| For each head k: |
| 1. Project: z_i = W_k h_i |
| 2. Attention: e_ij = |
| LeakyReLU(a^T [z_i || z_j]) |
| 3. Normalize: alpha_ij = |
| softmax_j(e_ij) * A_ij |
| 4. Aggregate: h_i' = |
| sigma(SUM_j alpha_ij z_j) |
| |
| Concatenate heads: [h1 || ... hK] |
+--------------------------------------+
|
v
Node Embeddings [batch, num_nodes, num_heads * hidden_size]Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions.
Usage
# Build a GAT for node classification
model = GAT.build(
input_dim: 16,
hidden_size: 8,
num_heads: 8,
num_classes: 7,
dropout: 0.6
)References
- "Graph Attention Networks" (Velickovic et al., ICLR 2018)
Summary
Functions
Compute attention coefficients between connected nodes.
Build a Graph Attention Network.
Single Graph Attention layer with multi-head attention.
Types
@type build_opt() :: {:activation, atom()} | {:dropout, float()} | {:hidden_size, pos_integer()} | {:input_dim, pos_integer()} | {:num_classes, pos_integer() | nil} | {:num_heads, pos_integer()} | {:num_layers, pos_integer()}
Options for build/1.
Functions
@spec attention_coefficients(Axon.t(), Axon.t(), pos_integer(), keyword()) :: Axon.t()
Compute attention coefficients between connected nodes.
Returns the raw (pre-softmax) attention scores for visualization or analysis. The attention mechanism is:
e_ij = LeakyReLU(a^T [W h_i || W h_j])Parameters
nodes- Node features Axon node{batch, num_nodes, feature_dim}adjacency- Adjacency matrix Axon node{batch, num_nodes, num_nodes}hidden_size- Projection dimensionopts- Options
Options
:name- Layer name prefix (default: "gat_attn"):negative_slope- LeakyReLU slope (default: 0.2)
Returns
Axon node with attention coefficients {batch, num_nodes, num_nodes}.
Build a Graph Attention Network.
Constructs a two-layer GAT with multi-head attention in the first layer (heads concatenated) and single-head attention in the output layer (heads averaged), following the original paper's design.
Options
:input_dim- Input feature dimension per node (required):hidden_size- Hidden dimension per attention head (default: 8):num_heads- Number of attention heads (default: 8):num_classes- Number of output classes (required):activation- Activation function (default: :elu):dropout- Dropout rate for features and attention (default: 0.0):num_layers- Number of GAT layers (default: 2)
Returns
An Axon model with two inputs ("nodes" and "adjacency"). Output shape is
{batch, num_nodes, num_classes} for node classification.
@spec gat_layer(Axon.t(), Axon.t(), pos_integer(), keyword()) :: Axon.t()
Single Graph Attention layer with multi-head attention.
Each attention head independently computes attention coefficients over neighbors and produces an output. Heads are either concatenated (hidden layers) or averaged (output layer).
Parameters
nodes- Node features Axon node{batch, num_nodes, in_dim}adjacency- Adjacency matrix Axon node{batch, num_nodes, num_nodes}output_dim- Output dimension per head
Options
:num_heads- Number of attention heads (default: 8):name- Layer name prefix (default: "gat"):activation- Activation function, nil for none (default: :elu):dropout- Dropout rate (default: 0.0):concat_heads- Concatenate heads (true) or average (false) (default: true):negative_slope- LeakyReLU negative slope (default: 0.2)
Returns
If concat_heads is true: {batch, num_nodes, num_heads * output_dim}
If concat_heads is false: {batch, num_nodes, output_dim}