Point cloud processing network (Qi et al., 2017).
PointNet processes unordered 3D point clouds for classification and segmentation. It achieves permutation invariance through a symmetric function (max pooling) applied after per-point feature extraction.
Architecture
Point Cloud [batch, num_points, point_dim]
|
v
+------------------------------+
| Optional T-Net: |
| Predict 3x3 transform |
| Apply to input points |
+------------------------------+
|
v
+------------------------------+
| Shared MLP (per-point): |
| 64 -> 64 |
+------------------------------+
|
v
+------------------------------+
| Optional Feature T-Net: |
| Predict 64x64 transform |
| Apply to point features |
+------------------------------+
|
v
+------------------------------+
| Shared MLP (per-point): |
| 64 -> 128 -> 1024 |
+------------------------------+
|
v
+------------------------------+
| Max Pool (symmetric fn): |
| Global feature vector |
+------------------------------+
|
v
+------------------------------+
| Global MLP + Classifier: |
| 512 -> 256 -> num_classes |
+------------------------------+
|
v
Output [batch, num_classes]Key Insight
The max pooling over points acts as a symmetric function, ensuring the network output is invariant to point ordering. The T-Net learns spatial transformations to canonicalize the input, improving robustness to geometric transformations.
Usage
# Basic PointNet for 3D classification
model = PointNet.build(
input_dim: 3,
num_classes: 40,
hidden_dims: [64, 128, 1024]
)
# With input transformation network
model = PointNet.build(
input_dim: 3,
num_classes: 40,
use_t_net: true
)References
- "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation" (Qi et al., CVPR 2017)
Summary
Functions
Build a PointNet model for point cloud classification.
Build a T-Net (Transformation Network) that predicts a transformation matrix.
Types
@type build_opt() :: {:activation, atom()} | {:dropout, float()} | {:global_dims, pos_integer()} | {:hidden_dims, pos_integer()} | {:input_dim, pos_integer()} | {:num_classes, pos_integer() | nil} | {:use_feature_t_net, boolean()} | {:use_t_net, boolean()}
Options for build/1.
Functions
Build a PointNet model for point cloud classification.
Options
:input_dim- Dimension of each point (default: 3 for 3D xyz):num_classes- Number of output classes (required):hidden_dims- Per-point MLP hidden sizes (default: [64, 128, 1024]):global_dims- Global MLP sizes after pooling (default: [512, 256]):activation- Activation function (default: :relu):dropout- Dropout rate for global MLP (default: 0.0):use_t_net- Use input transformation network (default: false):use_feature_t_net- Use feature transformation network (default: false)
Returns
An Axon model. Input shape: {batch, num_points, input_dim}.
Output shape: {batch, num_classes}.
@spec t_net(Axon.t(), pos_integer(), keyword()) :: Axon.t()
Build a T-Net (Transformation Network) that predicts a transformation matrix.
The T-Net is a mini-PointNet that processes the input to predict a KxK transformation matrix. This matrix is applied to the input to achieve spatial invariance.
Parameters
input- Axon node with shape{batch, num_points, k}k- Dimension of the transformation matrix (k x k)opts- Options
Options
:name- Layer name prefix (default: "t_net"):hidden_dims- T-Net MLP sizes (default: [64, 128, 256])
Returns
Axon node with shape {batch, k, k} representing the predicted
transformation matrix, initialized near identity.