# `Edifice.Vision.ConvNeXt`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/vision/convnext.ex#L1)

ConvNeXt - A Modernized ResNet implementation.

Modernizes the classic ResNet design with techniques borrowed from
transformers: depthwise-separable convolutions, inverted bottleneck
blocks, GELU activation, LayerNorm, and fewer activation functions.

## Architecture

```
Image [batch, channels, height, width]
      |
+-----v--------------------+
| Stem (4x4 strided conv)   |  [batch, H/4, W/4, dims[0]]
+---------------------------+
      |
+-----v--------------------+
| Stage 1                   |  depths[0] ConvNeXt blocks at dims[0]
|   DW Conv 7x7 -> LN      |
|   -> PW Conv (expand 4x) |
|   -> GELU                |
|   -> PW Conv (project)   |
|   -> LayerScale          |
|   -> Residual            |
+---------------------------+
      |
+-----v--------------------+
| Downsample                |  LN + 2x2 strided conv to dims[1]
+---------------------------+
      |
+-----v--------------------+
| Stage 2                   |  depths[1] ConvNeXt blocks at dims[1]
+---------------------------+
      |
      ... (repeat for each stage)
      |
+-----v--------------------+
| Global Average Pool       |  [batch, dims[-1]]
+---------------------------+
      |
+-----v--------------------+
| LayerNorm                 |
+---------------------------+
      |
+-----v--------------------+
| Optional Classifier       |
+---------------------------+
```

## ConvNeXt Block (faithful to paper)

```
Input [batch, H, W, C]
  |
  +----------- Residual ----------+
  |                               |
Depthwise Conv 7x7               |
  |                               |
LayerNorm                         |
  |                               |
Pointwise Conv (C -> 4C)         |
  |                               |
GELU                              |
  |                               |
Pointwise Conv (4C -> C)         |
  |                               |
LayerScale (learnable gamma)     |
  |                               |
  +---------- Add ---------------+
  |
Output
```

## Usage

    # ConvNeXt-Tiny
    model = ConvNeXt.build(
      image_size: 224,
      in_channels: 3,
      depths: [3, 3, 9, 3],
      dims: [96, 192, 384, 768],
      num_classes: 1000
    )

## References

- "A ConvNet for the 2020s" (Liu et al., CVPR 2022)

# `build_opt`

```elixir
@type build_opt() ::
  {:depths, [pos_integer()]}
  | {:dims, [pos_integer()]}
  | {:dropout, float()}
  | {:image_size, pos_integer()}
  | {:in_channels, pos_integer()}
  | {:layer_scale_init, float()}
  | {:num_classes, pos_integer() | nil}
  | {:patch_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a ConvNeXt model.

## Options

  - `:image_size` - Input image size, square (default: 224)
  - `:patch_size` - Stem patchify stride (default: 4)
  - `:in_channels` - Number of input channels (default: 3)
  - `:depths` - Number of blocks per stage (default: [3, 3, 9, 3])
  - `:dims` - Channel dimensions per stage (default: [96, 192, 384, 768])
  - `:dropout` - Dropout rate (default: 0.0)
  - `:layer_scale_init` - Initial value for layer scale (default: 1e-6)
  - `:num_classes` - Number of classes for classification head (optional)

## Returns

  An Axon model. Without `:num_classes`, outputs `[batch, dims[-1]]`.
  With `:num_classes`, outputs `[batch, num_classes]`.

# `output_size`

```elixir
@spec output_size(keyword()) :: pos_integer()
```

Get the output size of a ConvNeXt model.

Returns `:num_classes` if set, otherwise the last stage dimension.

---

*Consult [api-reference.md](api-reference.md) for complete listing*