U-Net encoder-decoder architecture with skip connections.
Originally designed for biomedical image segmentation, U-Net uses a symmetric encoder-decoder structure with skip connections that concatenate encoder features at each level with decoder features, preserving fine-grained spatial information.
This implementation uses real 2D convolutions, max-pooling for downsampling, and transposed convolutions for upsampling — faithful to the original paper.
Architecture
Image [batch, channels, height, width]
|
+-----v--------------------+
| Transpose to NHWC | [batch, H, W, C]
+---------------------------+
|
+-----v--------------------+ Skip Connections
| Encoder Level 1 | ----------+
| Conv 3x3 + BN + ReLU | |
| Conv 3x3 + BN + ReLU | |
| MaxPool 2x2 | |
+---------------------------+ |
| |
+-----v--------------------+ |
| Encoder Level 2 | -----+ |
| Conv 3x3 + BN + ReLU | | |
| Conv 3x3 + BN + ReLU | | |
| MaxPool 2x2 | | |
+---------------------------+ | |
| | |
... (depth levels) | |
| | |
+-----v--------------------+ | |
| Bottleneck | | |
| Conv 3x3 + BN + ReLU | | |
| Conv 3x3 + BN + ReLU | | |
+---------------------------+ | |
| | |
+-----v--------------------+ | |
| Decoder Level 2 | | |
| ConvTranspose 2x2 (up) | | |
| Concat skip <-----------+------+ |
| Conv 3x3 + BN + ReLU | |
| Conv 3x3 + BN + ReLU | |
+---------------------------+ |
| |
+-----v--------------------+ |
| Decoder Level 1 | |
| ConvTranspose 2x2 (up) | |
| Concat skip <-----------+------------+
| Conv 3x3 + BN + ReLU |
| Conv 3x3 + BN + ReLU |
+---------------------------+
|
+-----v--------------------+
| Output Conv 1x1 | [batch, H, W, out_channels]
+---------------------------+
|
+-----v--------------------+
| Transpose to NCHW | [batch, out_channels, H, W]
+---------------------------+Usage
# Basic U-Net for segmentation
model = UNet.build(
in_channels: 3,
out_channels: 1,
image_size: 256,
base_features: 64,
depth: 4
)
# Shallow U-Net for small images
model = UNet.build(
in_channels: 1,
out_channels: 10,
image_size: 28,
base_features: 32,
depth: 3
)References
- "U-Net: Convolutional Networks for Biomedical Image Segmentation" (Ronneberger et al., MICCAI 2015)
Summary
Types
@type build_opt() :: {:base_features, pos_integer()} | {:depth, pos_integer()} | {:dropout, float()} | {:image_size, pos_integer()} | {:in_channels, pos_integer()} | {:out_channels, pos_integer()} | {:use_attention, boolean()}
Options for build/1.
Functions
Build a U-Net model.
Options
:in_channels- Number of input channels (default: 3):out_channels- Number of output channels (default: 1):image_size- Input image size, square (default: 256):base_features- Feature count at first encoder level (default: 64):depth- Number of encoder/decoder levels (default: 4):dropout- Dropout rate (default: 0.0):use_attention- Add attention at bottleneck (default: false)
Returns
An Axon model outputting [batch, out_channels, image_size, image_size].
@spec output_size(keyword()) :: pos_integer()
Get the output size of a UNet model.
Returns out_channels * image_size * image_size (flattened spatial output).