# `Edifice.Sets.PointNet`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/sets/pointnet.ex#L1)

Point cloud processing network (Qi et al., 2017).

PointNet processes unordered 3D point clouds for classification and
segmentation. It achieves permutation invariance through a symmetric function
(max pooling) applied after per-point feature extraction.

## Architecture

```
Point Cloud [batch, num_points, point_dim]
      |
      v
+------------------------------+
| Optional T-Net:              |
|   Predict 3x3 transform     |
|   Apply to input points     |
+------------------------------+
      |
      v
+------------------------------+
| Shared MLP (per-point):      |
|   64 -> 64                   |
+------------------------------+
      |
      v
+------------------------------+
| Optional Feature T-Net:      |
|   Predict 64x64 transform   |
|   Apply to point features   |
+------------------------------+
      |
      v
+------------------------------+
| Shared MLP (per-point):      |
|   64 -> 128 -> 1024         |
+------------------------------+
      |
      v
+------------------------------+
| Max Pool (symmetric fn):     |
|   Global feature vector      |
+------------------------------+
      |
      v
+------------------------------+
| Global MLP + Classifier:     |
|   512 -> 256 -> num_classes  |
+------------------------------+
      |
      v
Output [batch, num_classes]
```

## Key Insight

The max pooling over points acts as a symmetric function, ensuring the
network output is invariant to point ordering. The T-Net learns spatial
transformations to canonicalize the input, improving robustness to
geometric transformations.

## Usage

    # Basic PointNet for 3D classification
    model = PointNet.build(
      input_dim: 3,
      num_classes: 40,
      hidden_dims: [64, 128, 1024]
    )

    # With input transformation network
    model = PointNet.build(
      input_dim: 3,
      num_classes: 40,
      use_t_net: true
    )

## References

- "PointNet: Deep Learning on Point Sets for 3D Classification and
  Segmentation" (Qi et al., CVPR 2017)

# `build_opt`

```elixir
@type build_opt() ::
  {:activation, atom()}
  | {:dropout, float()}
  | {:global_dims, pos_integer()}
  | {:hidden_dims, pos_integer()}
  | {:input_dim, pos_integer()}
  | {:num_classes, pos_integer() | nil}
  | {:use_feature_t_net, boolean()}
  | {:use_t_net, boolean()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a PointNet model for point cloud classification.

## Options

- `:input_dim` - Dimension of each point (default: 3 for 3D xyz)
- `:num_classes` - Number of output classes (required)
- `:hidden_dims` - Per-point MLP hidden sizes (default: [64, 128, 1024])
- `:global_dims` - Global MLP sizes after pooling (default: [512, 256])
- `:activation` - Activation function (default: :relu)
- `:dropout` - Dropout rate for global MLP (default: 0.0)
- `:use_t_net` - Use input transformation network (default: false)
- `:use_feature_t_net` - Use feature transformation network (default: false)

## Returns

An Axon model. Input shape: `{batch, num_points, input_dim}`.
Output shape: `{batch, num_classes}`.

# `t_net`

```elixir
@spec t_net(Axon.t(), pos_integer(), keyword()) :: Axon.t()
```

Build a T-Net (Transformation Network) that predicts a transformation matrix.

The T-Net is a mini-PointNet that processes the input to predict a KxK
transformation matrix. This matrix is applied to the input to achieve
spatial invariance.

## Parameters

- `input` - Axon node with shape `{batch, num_points, k}`
- `k` - Dimension of the transformation matrix (k x k)
- `opts` - Options

## Options

- `:name` - Layer name prefix (default: "t_net")
- `:hidden_dims` - T-Net MLP sizes (default: [64, 128, 256])

## Returns

Axon node with shape `{batch, k, k}` representing the predicted
transformation matrix, initialized near identity.

---

*Consult [api-reference.md](api-reference.md) for complete listing*