Edifice.Vision.UNet (Edifice v0.2.0)

U-Net encoder-decoder architecture with skip connections.

Originally designed for biomedical image segmentation, U-Net uses a symmetric encoder-decoder structure with skip connections that concatenate encoder features at each level with decoder features, preserving fine-grained spatial information.

This implementation uses real 2D convolutions, max-pooling for downsampling, and transposed convolutions for upsampling — faithful to the original paper.

Architecture

Image [batch, channels, height, width]
      |
+-----v--------------------+
| Transpose to NHWC         |  [batch, H, W, C]
+---------------------------+
      |
+-----v--------------------+       Skip Connections
| Encoder Level 1           |  ----------+
|   Conv 3x3 + BN + ReLU   |            |
|   Conv 3x3 + BN + ReLU   |            |
|   MaxPool 2x2             |            |
+---------------------------+            |
      |                                  |
+-----v--------------------+             |
| Encoder Level 2           |  -----+    |
|   Conv 3x3 + BN + ReLU   |       |    |
|   Conv 3x3 + BN + ReLU   |       |    |
|   MaxPool 2x2             |       |    |
+---------------------------+       |    |
      |                             |    |
      ... (depth levels)            |    |
      |                             |    |
+-----v--------------------+       |    |
| Bottleneck                |       |    |
|   Conv 3x3 + BN + ReLU   |       |    |
|   Conv 3x3 + BN + ReLU   |       |    |
+---------------------------+       |    |
      |                             |    |
+-----v--------------------+       |    |
| Decoder Level 2           |       |    |
|   ConvTranspose 2x2 (up) |       |    |
|   Concat skip <-----------+------+    |
|   Conv 3x3 + BN + ReLU   |            |
|   Conv 3x3 + BN + ReLU   |            |
+---------------------------+            |
      |                                  |
+-----v--------------------+             |
| Decoder Level 1           |            |
|   ConvTranspose 2x2 (up) |            |
|   Concat skip <-----------+------------+
|   Conv 3x3 + BN + ReLU   |
|   Conv 3x3 + BN + ReLU   |
+---------------------------+
      |
+-----v--------------------+
| Output Conv 1x1           |  [batch, H, W, out_channels]
+---------------------------+
      |
+-----v--------------------+
| Transpose to NCHW         |  [batch, out_channels, H, W]
+---------------------------+

Usage

# Basic U-Net for segmentation
model = UNet.build(
  in_channels: 3,
  out_channels: 1,
  image_size: 256,
  base_features: 64,
  depth: 4
)

# Shallow U-Net for small images
model = UNet.build(
  in_channels: 1,
  out_channels: 10,
  image_size: 28,
  base_features: 32,
  depth: 3
)

References

"U-Net: Convolutional Networks for Biomedical Image Segmentation" (Ronneberger et al., MICCAI 2015)

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a U-Net model.

output_size(opts \\ [])

Get the output size of a UNet model.

Types

build_opt()

@type build_opt() ::
  {:base_features, pos_integer()}
  | {:depth, pos_integer()}
  | {:dropout, float()}
  | {:image_size, pos_integer()}
  | {:in_channels, pos_integer()}
  | {:out_channels, pos_integer()}
  | {:use_attention, boolean()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a U-Net model.

Options

:in_channels - Number of input channels (default: 3)
:out_channels - Number of output channels (default: 1)
:image_size - Input image size, square (default: 256)
:base_features - Feature count at first encoder level (default: 64)
:depth - Number of encoder/decoder levels (default: 4)
:dropout - Dropout rate (default: 0.0)
:use_attention - Add attention at bottleneck (default: false)

Returns

An Axon model outputting [batch, out_channels, image_size, image_size].

output_size(opts \\ [])

@spec output_size(keyword()) :: pos_integer()

Get the output size of a UNet model.

Returns out_channels * image_size * image_size (flattened spatial output).