PhoenixKit.Emails.Archiver (phoenix_kit v1.6.16)
View SourceArchive and compress old email tracking data for optimal storage.
Provides comprehensive data lifecycle management for email tracking:
- Body Compression - Compress full email bodies after configurable time
- S3 Archival - Move old logs to S3 cold storage
- Sampling Optimization - Apply sampling to reduce storage load
- Cleanup Integration - Work with cleanup tasks for complete lifecycle
- Performance Optimization - Batch operations for large datasets
Storage Optimization Strategy
- Recent Data (0-7 days): Full storage with all fields
- Medium Data (7-30 days): Compress body_full, keep metadata
- Old Data (30-90 days): Archive to S3, keep local summary
- Ancient Data (90+ days): Delete after S3 confirmation
Settings Integration
All archival settings stored in phoenix_kit_settings:
email_compress_body- Days before compressing bodies (default: 30)email_archive_to_s3- Enable S3 archival (default: false)email_s3_bucket- S3 bucket nameemail_sampling_rate- Percentage to fully log (default: 100)email_retention_days- Total retention before deletion (default: 90)
Usage Examples
# Compress bodies older than 30 days
{compressed_count, size_saved} = PhoenixKit.Emails.Archiver.compress_old_bodies(30)
# Archive to S3 with automatic cleanup
{:ok, archived_count} = PhoenixKit.Emails.Archiver.archive_to_s3(90,
bucket: "my-email-archive",
prefix: "email-logs/2025/"
)
# Apply sampling to reduce future storage
sampled_email = PhoenixKit.Emails.Archiver.apply_sampling_rate(email)
# Get storage statistics
stats = PhoenixKit.Emails.Archiver.get_storage_stats()
# => %{total_logs: 50000, compressed: 15000, archived: 10000, size_mb: 2341}S3 Integration
Supports multiple S3-compatible storage providers:
- Amazon S3
- DigitalOcean Spaces
- Google Cloud Storage
- MinIO
- Any S3-compatible service
Compression Algorithm
Uses gzip compression for email bodies with fallback strategies:
- Gzip - Primary compression for text content
- Preview Only - Keep only first 500 chars for very old data
- Metadata Only - Keep only delivery status and timestamps
Batch Processing
All operations are designed for efficiency:
- Process in configurable batch sizes (default: 1000)
- Progress tracking for long operations
- Automatic retry on transient failures
- Memory-efficient streaming for large datasets
Summary
Functions
Apply sampling rate to email for storage optimization.
Archive old emails to S3 storage.
Compress email bodies older than specified days.
Get detailed storage breakdown by time periods.
Get comprehensive storage statistics.
Functions
Apply sampling rate to email for storage optimization.
Returns modified email with reduced storage footprint for non-critical emails.
Sampling Strategy
- Always Full: Error emails, bounces, complaints
- Always Full: Transactional emails (password resets, etc.)
- Sampling Applied: Marketing emails, newsletters
- Metadata Only: Bulk emails when over limit
Examples
# Apply system sampling rate
email = Archiver.apply_sampling_rate(original_email)
# Force specific sampling
email = Archiver.apply_sampling_rate(original_email, force_rate: 50)
Archive old emails to S3 storage.
Returns {:ok, archived_count} on success or {:error, reason} on failure.
Options
:bucket- S3 bucket name (required):prefix- S3 object key prefix:batch_size- Process in batches (default: 500):format- Archive format: :json (default), :csv, :parquet:delete_after_archive- Delete from DB after successful archive:include_events- Include email events in archive
Examples
# Basic S3 archival
{:ok, count} = Archiver.archive_to_s3(90,
bucket: "email-archive",
prefix: "logs/2025/"
)
# Archive with events and cleanup
{:ok, count} = Archiver.archive_to_s3(90,
bucket: "email-archive",
include_events: true,
delete_after_archive: true
)
Compress email bodies older than specified days.
Returns {compressed_count, size_saved_bytes}.
Options
:batch_size- Process in batches (default: 1000):dry_run- Show what would be compressed without doing it:preserve_errors- Don't compress emails with errors/bounces
Examples
# Compress bodies older than 30 days
{count, saved} = Archiver.compress_old_bodies(30)
# => {1523, 45231040}
# Dry run to see impact
{count, estimated_saved} = Archiver.compress_old_bodies(30, dry_run: true)
Get detailed storage breakdown by time periods.
Examples
iex> Archiver.get_storage_breakdown()
%{
last_7_days: %{logs: 5000, size_mb: 145, compressed: false},
last_30_days: %{logs: 15000, size_mb: 420, compressed: 8000},
last_90_days: %{logs: 35000, size_mb: 980, compressed: 25000},
older: %{logs: 70000, size_mb: 1200, archived: 45000}
}
Get comprehensive storage statistics.
Examples
iex> Archiver.get_storage_stats()
%{
total_logs: 125000,
total_events: 450000,
compressed_bodies: 45000,
archived_logs: 15000,
storage_size_mb: 2341,
oldest_log: ~U[2024-01-15 10:30:00Z],
compression_ratio: 0.65,
s3_archived_size_mb: 890
}