Edifice.Graph.GAT (Edifice v0.2.0)

Copy Markdown View Source

Graph Attention Network (Velickovic et al., 2018).

Implements attention-based message passing where each node attends to its neighbors with learned attention weights. Unlike GCN which uses fixed normalization, GAT learns to weight neighbor contributions adaptively.

Architecture

Node Features [batch, num_nodes, input_dim]
Adjacency     [batch, num_nodes, num_nodes]
      |
      v
+--------------------------------------+
| GAT Layer (K heads):                 |
|                                      |
|   For each head k:                   |
|     1. Project: z_i = W_k h_i        |
|     2. Attention: e_ij =             |
|        LeakyReLU(a^T [z_i || z_j])   |
|     3. Normalize: alpha_ij =         |
|        softmax_j(e_ij) * A_ij        |
|     4. Aggregate: h_i' =             |
|        sigma(SUM_j alpha_ij z_j)     |
|                                      |
|   Concatenate heads: [h1 || ... hK]  |
+--------------------------------------+
      |
      v
Node Embeddings [batch, num_nodes, num_heads * hidden_size]

Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions.

Usage

# Build a GAT for node classification
model = GAT.build(
  input_dim: 16,
  hidden_size: 8,
  num_heads: 8,
  num_classes: 7,
  dropout: 0.6
)

References

  • "Graph Attention Networks" (Velickovic et al., ICLR 2018)

Summary

Types

Options for build/1.

Functions

Compute attention coefficients between connected nodes.

Build a Graph Attention Network.

Single Graph Attention layer with multi-head attention.

Types

build_opt()

@type build_opt() ::
  {:activation, atom()}
  | {:dropout, float()}
  | {:hidden_size, pos_integer()}
  | {:input_dim, pos_integer()}
  | {:num_classes, pos_integer() | nil}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}

Options for build/1.

Functions

attention_coefficients(nodes, adjacency, hidden_size, opts \\ [])

@spec attention_coefficients(Axon.t(), Axon.t(), pos_integer(), keyword()) :: Axon.t()

Compute attention coefficients between connected nodes.

Returns the raw (pre-softmax) attention scores for visualization or analysis. The attention mechanism is:

e_ij = LeakyReLU(a^T [W h_i || W h_j])

Parameters

  • nodes - Node features Axon node {batch, num_nodes, feature_dim}
  • adjacency - Adjacency matrix Axon node {batch, num_nodes, num_nodes}
  • hidden_size - Projection dimension
  • opts - Options

Options

  • :name - Layer name prefix (default: "gat_attn")
  • :negative_slope - LeakyReLU slope (default: 0.2)

Returns

Axon node with attention coefficients {batch, num_nodes, num_nodes}.

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a Graph Attention Network.

Constructs a two-layer GAT with multi-head attention in the first layer (heads concatenated) and single-head attention in the output layer (heads averaged), following the original paper's design.

Options

  • :input_dim - Input feature dimension per node (required)
  • :hidden_size - Hidden dimension per attention head (default: 8)
  • :num_heads - Number of attention heads (default: 8)
  • :num_classes - Number of output classes (required)
  • :activation - Activation function (default: :elu)
  • :dropout - Dropout rate for features and attention (default: 0.0)
  • :num_layers - Number of GAT layers (default: 2)

Returns

An Axon model with two inputs ("nodes" and "adjacency"). Output shape is {batch, num_nodes, num_classes} for node classification.

gat_layer(nodes, adjacency, output_dim, opts \\ [])

@spec gat_layer(Axon.t(), Axon.t(), pos_integer(), keyword()) :: Axon.t()

Single Graph Attention layer with multi-head attention.

Each attention head independently computes attention coefficients over neighbors and produces an output. Heads are either concatenated (hidden layers) or averaged (output layer).

Parameters

  • nodes - Node features Axon node {batch, num_nodes, in_dim}
  • adjacency - Adjacency matrix Axon node {batch, num_nodes, num_nodes}
  • output_dim - Output dimension per head

Options

  • :num_heads - Number of attention heads (default: 8)
  • :name - Layer name prefix (default: "gat")
  • :activation - Activation function, nil for none (default: :elu)
  • :dropout - Dropout rate (default: 0.0)
  • :concat_heads - Concatenate heads (true) or average (false) (default: true)
  • :negative_slope - LeakyReLU negative slope (default: 0.2)

Returns

If concat_heads is true: {batch, num_nodes, num_heads * output_dim} If concat_heads is false: {batch, num_nodes, output_dim}