# `Edifice.Blocks.Softpick`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/blocks/softpick.ex#L1)

Softpick: non-saturating, naturally sparse normalization.

Softpick normalizes inputs by dividing by the total absolute magnitude:

```
Softpick(x)_i = x_i / (1 + sum_j(|x_j|))
```

## Key Properties

- **Non-saturating**: Unlike softmax, gradients don't vanish for large inputs
- **Naturally sparse**: Outputs preserve sign and relative magnitudes
- **Bounded**: Output magnitudes are always < 1 (divided by 1 + sum)
- **Simple**: No exponentials, just absolute values and division

## Comparison with Softmax

| Property | Softmax | Softpick |
|----------|---------|----------|
| Output range | (0, 1) | (-1, 1) |
| Sum of outputs | 1 | varies |
| Preserves sign | No | Yes |
| Saturation | Yes (exp) | No |
| Sparsity | Low (sum=1) | Natural |

## Use Cases

- Attention alternatives where sign matters
- Routing in mixture-of-experts
- Feature selection where sparsity is desired
- Any normalization where you want bounded outputs without saturation

## Usage as Nx Function

    # Direct computation
    normalized = Softpick.compute(logits)

## Usage in Axon Model

    model = Softpick.build(embed_dim: 256, hidden_size: 256)

## Reference

- "Beyond Softmax: Sparse and Non-Saturating Attention" (2025)

# `build_opt`

```elixir
@type build_opt() ::
  {:embed_dim, pos_integer()}
  | {:hidden_size, pos_integer()}
  | {:num_heads, pos_integer()}
  | {:num_layers, pos_integer()}
  | {:dropout, float()}
  | {:window_size, pos_integer()}
```

Options for `build/1`.

# `build`

```elixir
@spec build([build_opt()]) :: Axon.t()
```

Build a transformer model using Softpick instead of softmax in attention.

## Options

  - `:embed_dim` - Size of input embedding per timestep (required)
  - `:hidden_size` - Internal hidden dimension (default: 256)
  - `:num_heads` - Number of attention heads (default: 4)
  - `:num_layers` - Number of transformer blocks (default: 6)
  - `:dropout` - Dropout rate (default: 0.1)
  - `:window_size` - Expected sequence length for JIT optimization (default: 60)

## Returns

  An Axon model that outputs `[batch, hidden_size]` from the last position.

# `build_softpick_attention`

```elixir
@spec build_softpick_attention(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build attention layer using Softpick instead of softmax.

# `compute`

```elixir
@spec compute(
  Nx.Tensor.t(),
  keyword()
) :: Nx.Tensor.t()
```

Apply Softpick normalization to a tensor.

## Parameters

  - `x` - Input tensor of any shape
  - `opts` - Options:
    - `:axis` - Axis to normalize over (default: -1, last axis)

## Returns

  Normalized tensor: x_i / (1 + sum(|x_j|)) over the specified axis.

# `layer`

```elixir
@spec layer(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Create a Softpick Axon layer.

## Options

  - `:name` - Layer name prefix (default: "softpick")
  - `:axis` - Axis to normalize over (default: -1)

## Returns

  An Axon layer that applies Softpick normalization.

# `output_size`

```elixir
@spec output_size(keyword()) :: non_neg_integer()
```

Get the output dimension for a model configuration.

---

*Consult [api-reference.md](api-reference.md) for complete listing*