This guide covers monitoring, backup, maintenance, and troubleshooting for TimelessMetrics.

Store info

Elixir API

info = TimelessMetrics.info(:metrics)

Returns a map with:

FieldDescription
series_countNumber of active metric series
total_pointsTotal stored data points across all series
storage_bytesTotal storage size on disk
raw_buffer_pointsPoints currently in the raw buffer (not yet compressed)
bytes_per_pointAverage bytes per point (compression efficiency)

HTTP API

curl http://localhost:8428/health

Response:

{
  "status": "ok",
  "series": 150,
  "points": 1250000,
  "storage_bytes": 24000000,
  "buffer_points": 320,
  "bytes_per_point": 0.67
}

The /health endpoint is always accessible without authentication, making it suitable for load balancer health checks.

Flushing

Force flush all buffered data to compressed blocks on disk:

TimelessMetrics.flush(:metrics)
# There is no dedicated HTTP endpoint for flush, but you can trigger it
# before a backup to ensure all data is persisted.

Use this before backups or graceful shutdowns to ensure no data is lost in the buffer.

Backup

Elixir API

{:ok, result} = TimelessMetrics.backup(:metrics, "/tmp/metrics_backup")

Returns:

{:ok, %{
  path: "/tmp/metrics_backup",
  files: ["metrics.db", "shard_0/raw/current.wal", ...],
  total_bytes: 24000000
}}

HTTP API

# Backup to a server-side directory
curl -X POST http://localhost:8428/api/v1/backup \
  -H 'Content-Type: application/json' \
  -d '{"path": "/tmp/metrics_backup"}'

# Backup to default location (data_dir/backups/timestamp)
curl -X POST http://localhost:8428/api/v1/backup

Response:

{
  "status": "ok",
  "path": "/tmp/metrics_backup",
  "files": ["metrics.db", "shard_0/raw/current.wal"],
  "total_bytes": 24000000
}

Backup procedure

  1. Flush buffered data: TimelessMetrics.flush(:metrics)
  2. Create backup: TimelessMetrics.backup(:metrics, target_dir)
  3. The backup creates a consistent snapshot of all SQLite databases and data files
  4. Copy the backup directory to offsite storage

Restore procedure

  1. Stop the TimelessMetrics instance
  2. Replace the data directory contents with the backup files
  3. Start the instance -- it will load from the restored data

Merge compaction

Each SegmentBuilder periodically seals completed time windows into immutable .seg files for better compression and faster large-range queries. This runs automatically (every 10 seconds for completed windows, every 60 seconds for pending data).

Manual trigger

TimelessMetrics.merge_now(:metrics)
# => :ok (if any series merged blocks) or :noop (if no merge was needed)

Configuration

SettingDefaultDescription
segment_duration14,400 (4 hours)Time window per segment file
flush_threshold10,000Points per shard before immediate buffer flush
flush_interval5,000 (5 sec)Buffer flush interval

See Configuration Reference for tuning guidance.

Daily rollups

Rollups compute daily aggregates (avg, min, max, sum, count, last) for each series. They run automatically on a configurable interval (default: every 5 minutes).

Manual trigger

TimelessMetrics.rollup(:metrics)

Query rollup data

{:ok, rollups} = TimelessMetrics.query_daily(:metrics, "cpu_usage", %{"host" => "web-1"},
  System.os_time(:second) - 30 * 86_400,
  System.os_time(:second))

Returns:

{:ok, [
  %{bucket: 1700006400, avg: 73.5, min: 45.2, max: 98.7, count: 288, sum: 21168.0, last: 72.1},
  %{bucket: 1700092800, avg: 71.2, min: 42.8, max: 95.3, count: 288, sum: 20505.6, last: 70.8}
]}

Retention enforcement

Retention runs automatically (default: every hour) and deletes data older than the configured thresholds.

Manual trigger

TimelessMetrics.enforce_retention(:metrics)

Configuration

SettingDefaultDescription
raw_retention_seconds604,800 (7 days)How long to keep raw point data
daily_retention_seconds31,536,000 (365 days)How long to keep daily rollup data

See Configuration Reference for all options.

Self-monitoring

Prometheus metrics endpoint

GET /metrics exposes metrics in Prometheus text exposition format:

curl http://localhost:8428/metrics
# HELP vm_memory_total_bytes Total BEAM memory usage in bytes.
# TYPE vm_memory_total_bytes gauge
vm_memory_total_bytes 123456789
# HELP vm_memory_processes_bytes BEAM process memory in bytes.
# TYPE vm_memory_processes_bytes gauge
vm_memory_processes_bytes 45678901
# HELP vm_memory_ets_bytes BEAM ETS table memory in bytes.
# TYPE vm_memory_ets_bytes gauge
vm_memory_ets_bytes 12345678
# HELP vm_memory_binary_bytes BEAM binary memory in bytes.
# TYPE vm_memory_binary_bytes gauge
vm_memory_binary_bytes 9876543
# HELP vm_memory_atom_bytes BEAM atom memory in bytes.
# TYPE vm_memory_atom_bytes gauge
vm_memory_atom_bytes 654321
# HELP vm_memory_system_bytes BEAM system memory in bytes.
# TYPE vm_memory_system_bytes gauge
vm_memory_system_bytes 78901234
# HELP vm_process_count Number of BEAM processes.
# TYPE vm_process_count gauge
vm_process_count 342
# HELP vm_port_count Number of BEAM ports.
# TYPE vm_port_count gauge
vm_port_count 12
# HELP vm_run_queue_length Total BEAM scheduler run queue length.
# TYPE vm_run_queue_length gauge
vm_run_queue_length 0
# HELP timeless_series_count Number of active metric series.
# TYPE timeless_series_count gauge
timeless_series_count 150
# HELP timeless_total_points Total stored data points.
# TYPE timeless_total_points gauge
timeless_total_points 1250000
# HELP timeless_storage_bytes Storage size in bytes.
# TYPE timeless_storage_bytes gauge
timeless_storage_bytes 24000000
# HELP timeless_buffer_points Number of points in raw buffer.
# TYPE timeless_buffer_points gauge
timeless_buffer_points 320

Monitoring recommendations

Key metrics to watch:

MetricAlert thresholdDescription
timeless_series_countGrowing unexpectedlyCardinality explosion (too many label combinations)
timeless_buffer_pointsConsistently highFlush may be falling behind
vm_memory_total_bytesSystem-dependentBEAM memory usage
vm_process_count> series_count + baselineEach series is a process
vm_run_queue_lengthSustained > 0Schedulers overloaded
timeless_storage_bytesDisk capacityStorage growth

Scrape health

If scraping is enabled, monitor target health:

curl http://localhost:8428/api/v1/scrape_targets

Check the health field for each target: "up", "down", or "unknown".

Self-monitoring metrics written per scrape target:

MetricLabelsDescription
upjob, instance1 if last scrape succeeded, 0 if failed
scrape_duration_secondsjob, instanceLast scrape duration
scrape_samples_scrapedjob, instanceSamples from last scrape

Troubleshooting

High memory usage

  • Check timeless_series_count -- high cardinality (many unique label combinations) creates many processes
  • Reduce buffer_shards or flush_threshold to limit buffer memory
  • Reduce flush_interval to flush more frequently
  • Disable self_monitor: false if the store is monitoring itself recursively

Slow queries

  • Multi-series queries fan out across shards. Use label filters to narrow the query.
  • Use aggregated queries (query_aggregate_multi) instead of raw queries for dashboards
  • For long time ranges, use daily rollups (query_daily) instead of raw data
  • Trigger merge compaction to consolidate small blocks: TimelessMetrics.merge_now(:metrics)
  • Check vm_run_queue_length -- sustained values > 0 indicate CPU saturation

Disk space growing

  • Check retention settings: raw_retention_seconds and daily_retention_seconds
  • Run TimelessMetrics.enforce_retention(:metrics) manually to trigger cleanup
  • Monitor timeless_storage_bytes over time
  • High cardinality (many series) multiplies storage linearly

Scrape failures

  • Check target health via GET /api/v1/scrape_targets
  • The last_error field shows the failure reason
  • Common causes: target down, DNS resolution failure, timeout, wrong port
  • Increase scrape_timeout for slow targets

Data appears missing

  • Check that data was actually written: TimelessMetrics.list_metrics(:metrics)
  • Verify the time range: timestamps are unix seconds, not milliseconds
  • Check label filters: exact match is required unless using regex filters
  • Flush the buffer: TimelessMetrics.flush(:metrics) to ensure buffered data is queryable

OpenAPI documentation

Interactive API documentation is available at http://localhost:8428/api/docs, served via Scalar UI. The OpenAPI spec is at /api/openapi.json.