Edifice.Generative.LatentDiffusion (Edifice v0.2.0)

Latent Diffusion: Diffusion in VAE latent space.

Implements the Latent Diffusion Model (LDM) concept from "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., CVPR 2022). Instead of diffusing in the full input space, LDM first compresses data with a VAE encoder, runs diffusion in the compact latent space, then decodes back.

Key Innovation: Perceptual Compression + Diffusion

Training:
  1. Train VAE: input -> encoder -> z -> decoder -> reconstruction
  2. Freeze VAE
  3. Train diffusion in latent space:
     z_0 = encoder(input)
     z_t = add_noise(z_0, t)
     eps_hat = denoiser(z_t, t)
     loss = MSE(eps, eps_hat)

Inference:
  1. Sample z_T ~ N(0, I) in latent space
  2. Denoise: z_0 = diffusion_sample(z_T)
  3. Decode: output = decoder(z_0)

Advantages

Feature	Full-space Diffusion	Latent Diffusion
Compute	O(input_dim) per step	O(latent_dim) per step
Quality	Good	Good (perceptual compression)
Speed	Slow	Fast (smaller dim)
Memory	High	Low

Architecture

Returns a tuple of three models: {encoder, decoder, denoiser}

Input [batch, input_size]
      |
      v
+-------------------+
| Encoder (frozen)  |  -> z_0 [batch, latent_size]
+-------------------+
      |
      v (add noise)
+-------------------+
| Denoiser          |  -> eps_hat [batch, latent_size]
| (z_t, t) -> eps   |
+-------------------+
      |
      v (denoise)
+-------------------+
| Decoder (frozen)  |  -> output [batch, input_size]
+-------------------+

Usage

{encoder, decoder, denoiser} = LatentDiffusion.build(
  input_size: 287,
  latent_size: 32,
  hidden_size: 256,
  num_layers: 4
)

# Train VAE first, then freeze and train denoiser

Reference

Paper: "High-Resolution Image Synthesis with Latent Diffusion Models"
arXiv: https://arxiv.org/abs/2112.10752

Summary

Types

build_opt()

Options for build/1.

Functions

build(opts \\ [])

Build a Latent Diffusion Model.

build_decoder(input_size, latent_size, hidden_size)

Build the VAE decoder.

build_denoiser(latent_size, hidden_size, num_layers, num_steps)

Build the latent-space denoiser.

build_encoder(input_size, latent_size, hidden_size)

Build the VAE encoder.

kl_divergence(mu, log_var)

KL divergence for VAE training.

make_schedule(opts \\ [])

Create diffusion noise schedule.

output_size(opts \\ [])

Get the output size (latent dimension for the denoiser).

param_count(opts)

Calculate approximate parameter count for the full system.

recommended_defaults()

Get recommended defaults.

reparameterize(mu, log_var, key)

Reparameterization trick for the encoder.