PhoenixKit.Emails.Archiver (phoenix_kit v1.6.16)

View Source

Archive and compress old email tracking data for optimal storage.

Provides comprehensive data lifecycle management for email tracking:

  • Body Compression - Compress full email bodies after configurable time
  • S3 Archival - Move old logs to S3 cold storage
  • Sampling Optimization - Apply sampling to reduce storage load
  • Cleanup Integration - Work with cleanup tasks for complete lifecycle
  • Performance Optimization - Batch operations for large datasets

Storage Optimization Strategy

  1. Recent Data (0-7 days): Full storage with all fields
  2. Medium Data (7-30 days): Compress body_full, keep metadata
  3. Old Data (30-90 days): Archive to S3, keep local summary
  4. Ancient Data (90+ days): Delete after S3 confirmation

Settings Integration

All archival settings stored in phoenix_kit_settings:

  • email_compress_body - Days before compressing bodies (default: 30)
  • email_archive_to_s3 - Enable S3 archival (default: false)
  • email_s3_bucket - S3 bucket name
  • email_sampling_rate - Percentage to fully log (default: 100)
  • email_retention_days - Total retention before deletion (default: 90)

Usage Examples

# Compress bodies older than 30 days
{compressed_count, size_saved} = PhoenixKit.Emails.Archiver.compress_old_bodies(30)

# Archive to S3 with automatic cleanup
{:ok, archived_count} = PhoenixKit.Emails.Archiver.archive_to_s3(90, 
  bucket: "my-email-archive",
  prefix: "email-logs/2025/"
)

# Apply sampling to reduce future storage
sampled_email = PhoenixKit.Emails.Archiver.apply_sampling_rate(email)

# Get storage statistics
stats = PhoenixKit.Emails.Archiver.get_storage_stats()
# => %{total_logs: 50000, compressed: 15000, archived: 10000, size_mb: 2341}

S3 Integration

Supports multiple S3-compatible storage providers:

  • Amazon S3
  • DigitalOcean Spaces
  • Google Cloud Storage
  • MinIO
  • Any S3-compatible service

Compression Algorithm

Uses gzip compression for email bodies with fallback strategies:

  1. Gzip - Primary compression for text content
  2. Preview Only - Keep only first 500 chars for very old data
  3. Metadata Only - Keep only delivery status and timestamps

Batch Processing

All operations are designed for efficiency:

  • Process in configurable batch sizes (default: 1000)
  • Progress tracking for long operations
  • Automatic retry on transient failures
  • Memory-efficient streaming for large datasets

Summary

Functions

Apply sampling rate to email for storage optimization.

Archive old emails to S3 storage.

Compress email bodies older than specified days.

Get detailed storage breakdown by time periods.

Get comprehensive storage statistics.

Functions

apply_sampling_rate(email_attrs, opts \\ [])

Apply sampling rate to email for storage optimization.

Returns modified email with reduced storage footprint for non-critical emails.

Sampling Strategy

  • Always Full: Error emails, bounces, complaints
  • Always Full: Transactional emails (password resets, etc.)
  • Sampling Applied: Marketing emails, newsletters
  • Metadata Only: Bulk emails when over limit

Examples

# Apply system sampling rate
email = Archiver.apply_sampling_rate(original_email)

# Force specific sampling
email = Archiver.apply_sampling_rate(original_email, force_rate: 50)

archive_to_s3(days_old, opts \\ [])

Archive old emails to S3 storage.

Returns {:ok, archived_count} on success or {:error, reason} on failure.

Options

  • :bucket - S3 bucket name (required)
  • :prefix - S3 object key prefix
  • :batch_size - Process in batches (default: 500)
  • :format - Archive format: :json (default), :csv, :parquet
  • :delete_after_archive - Delete from DB after successful archive
  • :include_events - Include email events in archive

Examples

# Basic S3 archival
{:ok, count} = Archiver.archive_to_s3(90,
  bucket: "email-archive",
  prefix: "logs/2025/"
)

# Archive with events and cleanup
{:ok, count} = Archiver.archive_to_s3(90,
  bucket: "email-archive", 
  include_events: true,
  delete_after_archive: true
)

compress_old_bodies(days_old \\ nil, opts \\ [])

Compress email bodies older than specified days.

Returns {compressed_count, size_saved_bytes}.

Options

  • :batch_size - Process in batches (default: 1000)
  • :dry_run - Show what would be compressed without doing it
  • :preserve_errors - Don't compress emails with errors/bounces

Examples

# Compress bodies older than 30 days
{count, saved} = Archiver.compress_old_bodies(30)
# => {1523, 45231040}

# Dry run to see impact
{count, estimated_saved} = Archiver.compress_old_bodies(30, dry_run: true)

get_storage_breakdown()

Get detailed storage breakdown by time periods.

Examples

iex> Archiver.get_storage_breakdown()
%{
  last_7_days: %{logs: 5000, size_mb: 145, compressed: false},
  last_30_days: %{logs: 15000, size_mb: 420, compressed: 8000},
  last_90_days: %{logs: 35000, size_mb: 980, compressed: 25000},
  older: %{logs: 70000, size_mb: 1200, archived: 45000}
}

get_storage_stats()

Get comprehensive storage statistics.

Examples

iex> Archiver.get_storage_stats()
%{
  total_logs: 125000,
  total_events: 450000,
  compressed_bodies: 45000,
  archived_logs: 15000,
  storage_size_mb: 2341,
  oldest_log: ~U[2024-01-15 10:30:00Z],
  compression_ratio: 0.65,
  s3_archived_size_mb: 890
}