Graph Transformer with structural encoding.
Applies transformer-style multi-head attention to graph-structured data, using the adjacency matrix as an attention bias/mask to incorporate graph structure. Includes graph positional encoding via random walk structural encoding (RWSE) or Laplacian eigenvectors approximated via the adjacency matrix powers.
Architecture
Node Features [batch, num_nodes, input_dim]
Adjacency [batch, num_nodes, num_nodes]
|
v
+--------------------------------------+
| Input Projection + Positional Enc |
+--------------------------------------+
|
v
+--------------------------------------+
| Graph Transformer Layer 1: |
| Pre-Norm -> Multi-Head Attention |
| (adjacency as attention bias) |
| + Residual |
| Pre-Norm -> FFN + Residual |
+--------------------------------------+
|
v
+--------------------------------------+
| Graph Transformer Layer N |
+--------------------------------------+
|
v
Node Embeddings [batch, num_nodes, hidden_size]Usage
model = GraphTransformer.build(
input_dim: 16,
hidden_size: 64,
num_heads: 4,
num_layers: 4,
num_classes: 7
)References
- Dwivedi & Bresson, "A Generalization of Transformer Networks to Graphs" (AAAI 2021)
- Ying et al., "Do Transformers Really Perform Bad for Graph Representation?" (NeurIPS 2021)
Summary
Functions
Build a Graph Transformer.
Single Graph Transformer layer with pre-norm attention + FFN.
Get the output size of a Graph Transformer.
Types
@type build_opt() :: {:dropout, float()} | {:hidden_size, pos_integer()} | {:input_dim, pos_integer()} | {:num_classes, pos_integer() | nil} | {:num_heads, pos_integer()} | {:num_layers, pos_integer()} | {:pool, atom()}
Options for build/1.
Functions
Build a Graph Transformer.
Options
:input_dim- Input feature dimension per node (required):hidden_size- Hidden dimension (default: 64):num_heads- Number of attention heads (default: 4):num_layers- Number of transformer layers (default: 4):num_classes- If provided, adds a classification head (default: nil):dropout- Dropout rate (default: 0.0):pool- Global pooling for graph classification (default: nil)
Returns
An Axon model with two inputs ("nodes" and "adjacency").
@spec graph_transformer_layer(Axon.t(), Axon.t(), pos_integer(), keyword()) :: Axon.t()
Single Graph Transformer layer with pre-norm attention + FFN.
Options
:num_heads- Number of attention heads (default: 4):dropout- Dropout rate (default: 0.0):name- Layer name prefix
@spec output_size(keyword()) :: pos_integer()
Get the output size of a Graph Transformer.