ConvNeXt - A Modernized ResNet implementation.
Modernizes the classic ResNet design with techniques borrowed from transformers: depthwise-separable convolutions, inverted bottleneck blocks, GELU activation, LayerNorm, and fewer activation functions.
Architecture
Image [batch, channels, height, width]
|
+-----v--------------------+
| Stem (4x4 strided conv) | [batch, H/4, W/4, dims[0]]
+---------------------------+
|
+-----v--------------------+
| Stage 1 | depths[0] ConvNeXt blocks at dims[0]
| DW Conv 7x7 -> LN |
| -> PW Conv (expand 4x) |
| -> GELU |
| -> PW Conv (project) |
| -> LayerScale |
| -> Residual |
+---------------------------+
|
+-----v--------------------+
| Downsample | LN + 2x2 strided conv to dims[1]
+---------------------------+
|
+-----v--------------------+
| Stage 2 | depths[1] ConvNeXt blocks at dims[1]
+---------------------------+
|
... (repeat for each stage)
|
+-----v--------------------+
| Global Average Pool | [batch, dims[-1]]
+---------------------------+
|
+-----v--------------------+
| LayerNorm |
+---------------------------+
|
+-----v--------------------+
| Optional Classifier |
+---------------------------+ConvNeXt Block (faithful to paper)
Input [batch, H, W, C]
|
+----------- Residual ----------+
| |
Depthwise Conv 7x7 |
| |
LayerNorm |
| |
Pointwise Conv (C -> 4C) |
| |
GELU |
| |
Pointwise Conv (4C -> C) |
| |
LayerScale (learnable gamma) |
| |
+---------- Add ---------------+
|
OutputUsage
# ConvNeXt-Tiny
model = ConvNeXt.build(
image_size: 224,
in_channels: 3,
depths: [3, 3, 9, 3],
dims: [96, 192, 384, 768],
num_classes: 1000
)References
- "A ConvNet for the 2020s" (Liu et al., CVPR 2022)
Summary
Types
@type build_opt() :: {:depths, [pos_integer()]} | {:dims, [pos_integer()]} | {:dropout, float()} | {:image_size, pos_integer()} | {:in_channels, pos_integer()} | {:layer_scale_init, float()} | {:num_classes, pos_integer() | nil} | {:patch_size, pos_integer()}
Options for build/1.
Functions
Build a ConvNeXt model.
Options
:image_size- Input image size, square (default: 224):patch_size- Stem patchify stride (default: 4):in_channels- Number of input channels (default: 3):depths- Number of blocks per stage (default: [3, 3, 9, 3]):dims- Channel dimensions per stage (default: [96, 192, 384, 768]):dropout- Dropout rate (default: 0.0):layer_scale_init- Initial value for layer scale (default: 1e-6):num_classes- Number of classes for classification head (optional)
Returns
An Axon model. Without :num_classes, outputs [batch, dims[-1]].
With :num_classes, outputs [batch, num_classes].
@spec output_size(keyword()) :: pos_integer()
Get the output size of a ConvNeXt model.
Returns :num_classes if set, otherwise the last stage dimension.