# `Edifice.Vision.FocalNet`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/vision/focalnet.ex#L1)

FocalNet: Focal Modulation Networks for vision (Yang et al., 2022).

Replaces self-attention with focal modulation, which aggregates context at
multiple granularity levels using hierarchical depthwise convolutions and
gated aggregation. This provides a simple yet effective alternative to
attention that captures both local and global context.

## Architecture

```
Image [batch, channels, height, width]
      |
+-----v--------------------+
| Patch Embedding           |  Split into P x P patches, linear project
+---------------------------+
      |
      v
[batch, num_patches, hidden_size]
      |
+-----v--------------------+
| FocalNet Block x N        |
|                           |
| Focal Modulation:         |
|   q = Dense(x)            |
|   For each level l:       |
|     ctx += gelu(conv_l)   |
|   gate = sigmoid(Dense(x))|
|   out = q * gate * ctx    |
|   + Residual              |
|                           |
| FFN:                      |
|   Dense(4*h) -> GELU      |
|   -> Dense(h)             |
|   + Residual              |
+---------------------------+
      |
      v
+---------------------------+
| LayerNorm -> Mean Pool    |
+---------------------------+
      |
      v
[batch, hidden_size]
```

## Usage

    model = FocalNet.build(
      image_size: 224,
      patch_size: 16,
      hidden_size: 256,
      num_layers: 4,
      focal_levels: 3,
      num_classes: 1000
    )

## References

- Yang et al., "Focal Modulation Networks" (NeurIPS 2022)
- https://arxiv.org/abs/2203.11926

# `build_opt`

```elixir
@type build_opt() ::
  {:focal_kernel, pos_integer()}
  | {:focal_levels, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:image_size, pos_integer()}
  | {:in_channels, pos_integer()}
  | {:num_classes, pos_integer() | nil}
  | {:num_layers, pos_integer()}
  | {:patch_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a FocalNet model.

## Options

  - `:image_size` - Input image size, square (default: 224)
  - `:patch_size` - Patch size, square (default: 16)
  - `:in_channels` - Number of input channels (default: 3)
  - `:hidden_size` - Hidden dimension per patch (default: 256)
  - `:num_layers` - Number of FocalNet blocks (default: 4)
  - `:focal_levels` - Number of focal context levels (default: 3)
  - `:focal_kernel` - Base kernel size for focal convolutions (default: 3)
  - `:num_classes` - Number of output classes (optional)

## Returns

  An Axon model. Without `:num_classes`, outputs `[batch, hidden_size]`.
  With `:num_classes`, outputs `[batch, num_classes]`.

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of a FocalNet model.

Returns `:num_classes` if set, otherwise `:hidden_size`.

---

*Consult [api-reference.md](api-reference.md) for complete listing*