This guide covers monitoring, backup, retention, and troubleshooting for TimelessTraces.
Statistics
Get aggregate storage statistics without reading any blocks:
{:ok, stats} = TimelessTraces.stats()Returns a %TimelessTraces.Stats{} struct:
| Field | Description |
|---|---|
total_blocks | Number of stored blocks (raw + compressed) |
total_entries | Total spans across all blocks |
total_bytes | Total block storage size |
disk_size | On-disk storage size |
index_size | Index snapshot + log file size |
oldest_timestamp | Timestamp of oldest span (nanoseconds) |
newest_timestamp | Timestamp of newest span (nanoseconds) |
raw_blocks | Number of uncompressed raw blocks |
raw_bytes | Size of raw blocks |
raw_entries | Entries in raw blocks |
zstd_blocks | Number of zstd-compressed blocks |
zstd_bytes | Size of zstd blocks |
zstd_entries | Entries in zstd blocks |
openzl_blocks | Number of OpenZL-compressed blocks |
openzl_bytes | Size of OpenZL blocks |
openzl_entries | Entries in OpenZL blocks |
compression_raw_bytes_in | Total raw bytes processed by compactor |
compression_compressed_bytes_out | Total compressed bytes produced |
compaction_count | Number of compaction runs |
HTTP API
curl http://localhost:10428/health
Returns status, blocks, spans, and disk_size.
Flushing
Force flush the buffer to write pending spans to storage immediately:
TimelessTraces.flush()curl http://localhost:10428/api/v1/flush
Use before backups or graceful shutdowns.
Backup
Create a consistent online backup without stopping the application.
Elixir API
{:ok, result} = TimelessTraces.backup("/tmp/span_backup")
# => {:ok, %{path: "/tmp/span_backup", files: ["index.snapshot", "blocks"], total_bytes: 24000000}}HTTP API
curl -X POST http://localhost:10428/api/v1/backup \
-H 'Content-Type: application/json' \
-d '{"path": "/tmp/span_backup"}'
Backup procedure
- The buffer is flushed (all pending spans written to storage)
- ETS index is written as a snapshot file (atomic rename)
- Block files are copied in parallel to the target directory
- Returns the backup path, file list, and total bytes
Restore procedure
- Stop the TimelessTraces application
- Replace the data directory contents with the backup files
- Start the application -- it will load from the restored data
Retention
Retention runs automatically to prevent unbounded disk growth. Two independent policies are enforced:
Age-based retention
Delete blocks with ts_max older than the cutoff:
config :timeless_traces,
retention_max_age: 7 * 86_400 # 7 days (default)Size-based retention
Delete oldest blocks until total size is under the limit:
config :timeless_traces,
retention_max_size: 512 * 1024 * 1024 # 512 MB (default)Disable retention
config :timeless_traces,
retention_max_age: nil, # No age limit
retention_max_size: nil # No size limitManual trigger
TimelessTraces.Retention.run_now()Check interval
config :timeless_traces,
retention_check_interval: 300_000 # 5 minutes (default)Telemetry events
TimelessTraces emits telemetry events for monitoring:
| Event | Measurements | Metadata |
|---|---|---|
[:timeless_traces, :flush, :stop] | duration, entry_count, byte_size | block_id |
[:timeless_traces, :query, :stop] | duration, total, blocks_read | filters |
[:timeless_traces, :retention, :stop] | duration, blocks_deleted | |
[:timeless_traces, :compaction, :stop] | duration, raw_blocks, entry_count, byte_size | |
[:timeless_traces, :merge_compaction, :stop] | duration, batches_merged, blocks_consumed | |
[:timeless_traces, :block, :error] | file_path, reason |
See the Telemetry guide for handler examples.
Key metrics to monitor
| Metric | Source | Alert threshold |
|---|---|---|
| Flush duration | [:flush, :stop] duration | Sustained > 100ms |
| Flush entry count | [:flush, :stop] entry_count | Sustained at max_buffer_size |
| Query latency | [:query, :stop] duration | > 5s for typical queries |
| Blocks read per query | [:query, :stop] blocks_read | Growing linearly |
| Block read errors | [:block, :error] | Any occurrence |
| Retention blocks deleted | [:retention, :stop] blocks_deleted | 0 when disk is growing |
Troubleshooting
High memory usage
- Check
raw_blocksin stats -- many uncompacted raw blocks use more memory - Trigger compaction:
TimelessTraces.Compactor.compact_now() - Reduce
max_buffer_sizeto flush smaller batches - Check for slow subscribers blocking the buffer
Disk space growing
- Verify retention is configured: check
retention_max_ageandretention_max_size - Trigger retention manually:
TimelessTraces.Retention.run_now() - Check stats for
total_bytestrends - Reduce retention age or size limits
Slow queries
- Use
:service,:kind, and:statusfilters to leverage the term index - Avoid full scans (no filters) on large datasets
- Reduce the time range with
:sinceand:until - Check
raw_blockscount -- many small raw blocks are slower to query than fewer compressed blocks - Trigger merge compaction to consolidate small compressed blocks:
TimelessTraces.merge_now()
Spans not appearing in queries
- Flush the buffer:
TimelessTraces.flush() - Check that the OTel exporter is configured: verify
traces_exporter: {TimelessTraces.Exporter, []}in config - Verify the data_dir exists and is writable
- Check for block read errors in telemetry events
- Check the
index_publish_interval-- spans become queryable after the index batches them (default 2s)
Compaction not running
- Check
raw_blocksandraw_entriesin stats - Verify
compaction_thresholdisn't set too high for your span volume - Trigger manually:
TimelessTraces.Compactor.compact_now() - Check that
compaction_max_raw_ageis reasonable (default: 60 seconds)
OTel exporter not working
- Verify the dependency:
{:opentelemetry, "~> 1.5"}must be in your mix.exs - Check the config:
config :opentelemetry, traces_exporter: {TimelessTraces.Exporter, []} - Ensure spans are being created: use
OpenTelemetry.Tracer.with_span/2in your code - Check that TimelessTraces.Application has started (it must start before spans are exported)