# `Edifice.Blocks.PatchEmbed`
[🔗](https://github.com/blasphemetheus/edifice/blob/main/lib/edifice/blocks/patch_embed.ex#L1)

Patch Embedding for Vision Transformers.

Splits images into fixed-size patches and linearly projects each patch
into an embedding vector. This is the standard input processing for ViT,
DeiT, MAE, and other vision transformer architectures.

## How It Works

1. Split image into non-overlapping patches of size P x P
2. Flatten each patch into a vector of size P*P*C
3. Linear project to embedding dimension

For a 224x224 image with 16x16 patches: 196 patches, each 768-dim (16*16*3).

## Architecture

```
Image [batch, channels, height, width]
      |
      v
+----------------------------------+
| Split into P x P patches         |
| (H/P * W/P = num_patches total) |
+----------------------------------+
      |
      v
[batch, num_patches, P*P*C]
      |
      v
+----------------------------------+
| Linear projection to embed_dim   |
+----------------------------------+
      |
      v
[batch, num_patches, embed_dim]
```

## Usage

    patches = PatchEmbed.layer(image,
      image_size: 224,
      patch_size: 16,
      in_channels: 3,
      embed_dim: 768
    )

## References
- "An Image is Worth 16x16 Words" (Dosovitskiy et al., 2021)

# `layer`

```elixir
@spec layer(
  Axon.t(),
  keyword()
) :: Axon.t()
```

Build a patch embedding Axon layer.

## Options
  - `:image_size` - Input image size (square, default: 224)
  - `:patch_size` - Patch size (square, default: 16)
  - `:in_channels` - Number of input channels (default: 3)
  - `:embed_dim` - Output embedding dimension (required)
  - `:name` - Layer name prefix (default: "patch_embed")

# `num_patches`

```elixir
@spec num_patches(pos_integer(), pos_integer()) :: pos_integer()
```

Calculate the number of patches for given image and patch sizes.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
