# `Edifice.Graph.GraphTransformer`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/graph/graph_transformer.ex#L1)

Graph Transformer with structural encoding.

Applies transformer-style multi-head attention to graph-structured data,
using the adjacency matrix as an attention bias/mask to incorporate graph
structure. Includes graph positional encoding via random walk structural
encoding (RWSE) or Laplacian eigenvectors approximated via the adjacency
matrix powers.

## Architecture

```
Node Features [batch, num_nodes, input_dim]
Adjacency     [batch, num_nodes, num_nodes]
      |
      v
+--------------------------------------+
| Input Projection + Positional Enc    |
+--------------------------------------+
      |
      v
+--------------------------------------+
| Graph Transformer Layer 1:           |
|   Pre-Norm -> Multi-Head Attention   |
|   (adjacency as attention bias)      |
|   + Residual                         |
|   Pre-Norm -> FFN + Residual         |
+--------------------------------------+
      |
      v
+--------------------------------------+
| Graph Transformer Layer N            |
+--------------------------------------+
      |
      v
Node Embeddings [batch, num_nodes, hidden_size]
```

## Usage

    model = GraphTransformer.build(
      input_dim: 16,
      hidden_size: 64,
      num_heads: 4,
      num_layers: 4,
      num_classes: 7
    )

## References

- Dwivedi & Bresson, "A Generalization of Transformer Networks to Graphs" (AAAI 2021)
- Ying et al., "Do Transformers Really Perform Bad for Graph Representation?" (NeurIPS 2021)

# `build_opt`

```elixir
@type build_opt() ::
  {:dropout, float()}
  | {:hidden_size, pos_integer()}
  | {:input_dim, pos_integer()}
  | {:num_classes, pos_integer() | nil}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:pool, atom()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Graph Transformer.

## Options

- `:input_dim` - Input feature dimension per node (required)
- `:hidden_size` - Hidden dimension (default: 64)
- `:num_heads` - Number of attention heads (default: 4)
- `:num_layers` - Number of transformer layers (default: 4)
- `:num_classes` - If provided, adds a classification head (default: nil)
- `:dropout` - Dropout rate (default: 0.0)
- `:pool` - Global pooling for graph classification (default: nil)

## Returns

An Axon model with two inputs ("nodes" and "adjacency").

# `graph_transformer_layer`

```elixir
@spec graph_transformer_layer(Axon.t(), Axon.t(), pos_integer(), keyword()) ::
  Axon.t()
```

Single Graph Transformer layer with pre-norm attention + FFN.

## Options

- `:num_heads` - Number of attention heads (default: 4)
- `:dropout` - Dropout rate (default: 0.0)
- `:name` - Layer name prefix

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of a Graph Transformer.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
