Edifice.Vision.ConvNeXt (Edifice v0.2.0)

ConvNeXt - A Modernized ResNet implementation.

Modernizes the classic ResNet design with techniques borrowed from transformers: depthwise-separable convolutions, inverted bottleneck blocks, GELU activation, LayerNorm, and fewer activation functions.

Architecture

Image [batch, channels, height, width]
      |
+-----v--------------------+
| Stem (4x4 strided conv)   |  [batch, H/4, W/4, dims[0]]
+---------------------------+
      |
+-----v--------------------+
| Stage 1                   |  depths[0] ConvNeXt blocks at dims[0]
|   DW Conv 7x7 -> LN      |
|   -> PW Conv (expand 4x) |
|   -> GELU                |
|   -> PW Conv (project)   |
|   -> LayerScale          |
|   -> Residual            |
+---------------------------+
      |
+-----v--------------------+
| Downsample                |  LN + 2x2 strided conv to dims[1]
+---------------------------+
      |
+-----v--------------------+
| Stage 2                   |  depths[1] ConvNeXt blocks at dims[1]
+---------------------------+
      |
      ... (repeat for each stage)
      |
+-----v--------------------+
| Global Average Pool       |  [batch, dims[-1]]
+---------------------------+
      |
+-----v--------------------+
| LayerNorm                 |
+---------------------------+
      |
+-----v--------------------+
| Optional Classifier       |
+---------------------------+

ConvNeXt Block (faithful to paper)

Input [batch, H, W, C]
  |
  +----------- Residual ----------+
  |                               |
Depthwise Conv 7x7               |
  |                               |
LayerNorm                         |
  |                               |
Pointwise Conv (C -> 4C)         |
  |                               |
GELU                              |
  |                               |
Pointwise Conv (4C -> C)         |
  |                               |
LayerScale (learnable gamma)     |
  |                               |
  +---------- Add ---------------+
  |
Output

Usage

# ConvNeXt-Tiny
model = ConvNeXt.build(
  image_size: 224,
  in_channels: 3,
  depths: [3, 3, 9, 3],
  dims: [96, 192, 384, 768],
  num_classes: 1000
)

References

"A ConvNet for the 2020s" (Liu et al., CVPR 2022)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a ConvNeXt model.

output_size(opts \\ [])

Get the output size of a ConvNeXt model.

Types

build_opt()

@type build_opt() ::
  {:depths, [pos_integer()]}
  | {:dims, [pos_integer()]}
  | {:dropout, float()}
  | {:image_size, pos_integer()}
  | {:in_channels, pos_integer()}
  | {:layer_scale_init, float()}
  | {:num_classes, pos_integer() | nil}
  | {:patch_size, pos_integer()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a ConvNeXt model.

Options

:image_size - Input image size, square (default: 224)
:patch_size - Stem patchify stride (default: 4)
:in_channels - Number of input channels (default: 3)
:depths - Number of blocks per stage (default: [3, 3, 9, 3])
:dims - Channel dimensions per stage (default: [96, 192, 384, 768])
:dropout - Dropout rate (default: 0.0)
:layer_scale_init - Initial value for layer scale (default: 1e-6)
:num_classes - Number of classes for classification head (optional)

Returns

An Axon model. Without :num_classes, outputs [batch, dims[-1]]. With :num_classes, outputs [batch, num_classes].

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of a ConvNeXt model.

Returns :num_classes if set, otherwise the last stage dimension.