# `Edifice.Meta.Capsule`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/meta/capsule.ex#L1)

Capsule Networks with dynamic routing (Sabour et al., 2017).

Capsule Networks replace scalar neuron activations with vector "capsules"
that encode both the probability of an entity's existence (vector length)
and its instantiation parameters (vector direction). This preserves
spatial hierarchies that CNNs lose through max-pooling.

## Key Concepts

- **Capsule**: A group of neurons whose activity vector represents an entity.
  Vector length = probability of entity, direction = entity properties.
- **Squash**: Non-linear activation that preserves direction but squashes
  length to [0, 1]: `v = (||s||^2 / (1 + ||s||^2)) * (s / ||s||)`
- **Dynamic Routing**: Agreement-based routing where lower capsules send
  output to higher capsules that "agree" with their predictions.

## Architecture

```
Input [batch, height, width, channels]
      |
      v
+----------------------------+
|    Conv Layer              |
+----------------------------+
      |
      v
+----------------------------+
| Primary Capsule Layer      |
| (Conv -> reshape to caps)  |
+----------------------------+
      |
      v
+----------------------------+
| Dynamic Routing            |
| (routing by agreement)     |
+----------------------------+
      |
      v
+----------------------------+
| Digit/Output Capsules      |
+----------------------------+
      |
      v
Output: capsule vectors [batch, num_digit_caps, digit_cap_dim]
Length of each capsule = class probability
```

## Usage

    model = Capsule.build(
      input_shape: {nil, 28, 28, 1},
      num_primary_caps: 32,
      primary_cap_dim: 8,
      num_digit_caps: 10,
      digit_cap_dim: 16,
      routing_iterations: 3
    )

## References
- Sabour et al., "Dynamic Routing Between Capsules" (2017)
- https://arxiv.org/abs/1710.09829

# `build_opt`

```elixir
@type build_opt() ::
  {:conv_channels, pos_integer()}
  | {:conv_kernel, pos_integer()}
  | {:digit_cap_dim, pos_integer()}
  | {:input_shape, tuple()}
  | {:num_digit_caps, pos_integer()}
  | {:num_primary_caps, pos_integer()}
  | {:primary_cap_dim, pos_integer()}
  | {:primary_kernel, pos_integer()}
  | {:primary_strides, pos_integer()}
  | {:routing_iterations, float()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a Capsule Network (CapsNet).

## Options
  - `:input_shape` - Input shape as `{nil, height, width, channels}` (required)
  - `:num_primary_caps` - Number of primary capsule types (default: 32)
  - `:primary_cap_dim` - Dimension of each primary capsule (default: 8)
  - `:num_digit_caps` - Number of output capsules (default: 10)
  - `:digit_cap_dim` - Dimension of each output capsule (default: 16)
  - `:routing_iterations` - Number of dynamic routing iterations (default: 3)
  - `:conv_channels` - Initial convolution channels (default: 256)
  - `:conv_kernel` - Initial convolution kernel size (default: 9)
  - `:primary_kernel` - Primary capsule convolution kernel size (default: 9)
  - `:primary_strides` - Primary capsule convolution strides (default: 2)

## Returns
  An Axon model producing capsule norms `[batch, num_digit_caps]`
  representing class probabilities.

# `dynamic_routing`

```elixir
@spec dynamic_routing(Axon.t(), pos_integer(), pos_integer(), keyword()) :: Axon.t()
```

Dynamic routing by agreement between capsule layers.

Lower-level capsules predict the output of higher-level capsules via
learned transformation matrices. Routing coefficients are iteratively
updated based on agreement between predictions and actual outputs.

## Algorithm
1. Initialize routing logits b_ij = 0
2. For each iteration:
   a. Compute routing coefficients: c_ij = softmax(b_ij)
   b. Compute weighted prediction sum: s_j = sum(c_ij * u_hat_ij)
   c. Apply squash: v_j = squash(s_j)
   d. Update logits: b_ij += u_hat_ij . v_j (agreement)

## Parameters
  - `input_caps` - Axon node with input capsules `[batch, num_input_caps, input_cap_dim]`
  - `num_output_caps` - Number of output capsules
  - `output_cap_dim` - Dimension of each output capsule

## Options
  - `:routing_iterations` - Number of routing iterations (default: 3)
  - `:name` - Layer name prefix

## Returns
  An Axon node with shape `[batch, num_output_caps, output_cap_dim]`

# `primary_capsule_layer`

```elixir
@spec primary_capsule_layer(Axon.t(), pos_integer(), pos_integer(), keyword()) ::
  Axon.t()
```

Build a primary capsule layer.

Converts a standard convolutional feature map into capsule vectors.
Uses convolution to produce `num_caps * cap_dim` channels, then
reshapes into capsule vectors and applies the squash activation.

## Parameters
  - `input` - Axon node with conv features `[batch, height, width, channels]`
  - `num_caps` - Number of capsule types
  - `cap_dim` - Dimension of each capsule vector

## Options
  - `:kernel_size` - Convolution kernel size (default: 9)
  - `:strides` - Convolution strides (default: 2)
  - `:name` - Layer name prefix

## Returns
  An Axon node with shape `[batch, total_num_capsules, cap_dim]`
  where total_num_capsules = num_caps * spatial_positions

# `squash`

```elixir
@spec squash(Nx.Tensor.t()) :: Nx.Tensor.t()
```

Squash activation function for capsule vectors.

Non-linear "squashing" that preserves the direction of the vector
but scales its magnitude to be between 0 and 1.

    v = (||s||^2 / (1 + ||s||^2)) * (s / ||s||)

Short vectors get shrunk to near zero length, long vectors get
shrunk to just below 1. Direction is preserved.

## Parameters
  - `tensor` - Input tensor `[..., cap_dim]`

## Returns
  Squashed tensor with same shape, magnitudes in [0, 1)

---

*Consult [api-reference.md](api-reference.md) for complete listing*
