Edifice.Sets.PointNet (Edifice v0.2.0)

Copy Markdown View Source

Point cloud processing network (Qi et al., 2017).

PointNet processes unordered 3D point clouds for classification and segmentation. It achieves permutation invariance through a symmetric function (max pooling) applied after per-point feature extraction.

Architecture

Point Cloud [batch, num_points, point_dim]
      |
      v
+------------------------------+
| Optional T-Net:              |
|   Predict 3x3 transform     |
|   Apply to input points     |
+------------------------------+
      |
      v
+------------------------------+
| Shared MLP (per-point):      |
|   64 -> 64                   |
+------------------------------+
      |
      v
+------------------------------+
| Optional Feature T-Net:      |
|   Predict 64x64 transform   |
|   Apply to point features   |
+------------------------------+
      |
      v
+------------------------------+
| Shared MLP (per-point):      |
|   64 -> 128 -> 1024         |
+------------------------------+
      |
      v
+------------------------------+
| Max Pool (symmetric fn):     |
|   Global feature vector      |
+------------------------------+
      |
      v
+------------------------------+
| Global MLP + Classifier:     |
|   512 -> 256 -> num_classes  |
+------------------------------+
      |
      v
Output [batch, num_classes]

Key Insight

The max pooling over points acts as a symmetric function, ensuring the network output is invariant to point ordering. The T-Net learns spatial transformations to canonicalize the input, improving robustness to geometric transformations.

Usage

# Basic PointNet for 3D classification
model = PointNet.build(
  input_dim: 3,
  num_classes: 40,
  hidden_dims: [64, 128, 1024]
)

# With input transformation network
model = PointNet.build(
  input_dim: 3,
  num_classes: 40,
  use_t_net: true
)

References

  • "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation" (Qi et al., CVPR 2017)

Summary

Types

Options for build/1.

Functions

Build a PointNet model for point cloud classification.

Build a T-Net (Transformation Network) that predicts a transformation matrix.

Types

build_opt()

@type build_opt() ::
  {:activation, atom()}
  | {:dropout, float()}
  | {:global_dims, pos_integer()}
  | {:hidden_dims, pos_integer()}
  | {:input_dim, pos_integer()}
  | {:num_classes, pos_integer() | nil}
  | {:use_feature_t_net, boolean()}
  | {:use_t_net, boolean()}

Options for build/1.

Functions

build(opts \\ [])

@spec build([build_opt()]) :: Axon.t()

Build a PointNet model for point cloud classification.

Options

  • :input_dim - Dimension of each point (default: 3 for 3D xyz)
  • :num_classes - Number of output classes (required)
  • :hidden_dims - Per-point MLP hidden sizes (default: [64, 128, 1024])
  • :global_dims - Global MLP sizes after pooling (default: [512, 256])
  • :activation - Activation function (default: :relu)
  • :dropout - Dropout rate for global MLP (default: 0.0)
  • :use_t_net - Use input transformation network (default: false)
  • :use_feature_t_net - Use feature transformation network (default: false)

Returns

An Axon model. Input shape: {batch, num_points, input_dim}. Output shape: {batch, num_classes}.

t_net(input, k, opts \\ [])

@spec t_net(Axon.t(), pos_integer(), keyword()) :: Axon.t()

Build a T-Net (Transformation Network) that predicts a transformation matrix.

The T-Net is a mini-PointNet that processes the input to predict a KxK transformation matrix. This matrix is applied to the input to achieve spatial invariance.

Parameters

  • input - Axon node with shape {batch, num_points, k}
  • k - Dimension of the transformation matrix (k x k)
  • opts - Options

Options

  • :name - Layer name prefix (default: "t_net")
  • :hidden_dims - T-Net MLP sizes (default: [64, 128, 256])

Returns

Axon node with shape {batch, k, k} representing the predicted transformation matrix, initialized near identity.