This document covers TimelessTraces' internal architecture: the supervision tree, data flow, storage format, and indexing strategy.
Supervision tree
TimelessTraces.Supervisor (:one_for_one)
├── Registry (TimelessTraces.Registry) # Pub/sub for live tail
├── Index (GenServer) # ETS indexing + snapshot/disk log persistence
├── FlushSupervisor (Task.Supervisor) # Async flush tasks
├── Buffer (GenServer) # Span accumulation
├── Compactor (GenServer) # Raw → compressed blocks
├── Retention (GenServer) # Age/size cleanup
└── HTTP (Bandit, optional) # OTLP ingest + Jaeger queryAll components start automatically as an OTP application. The HTTP server is only started when http is configured.
Data flow
OTel SDK
↓
Exporter (reads spans from OTel ETS table)
↓
Buffer (accumulate, auto-flush every 1s or 1000 spans)
↓ broadcasts to subscribers
Writer (serialize as raw Erlang terms)
↓
Disk (blocks/) or Memory (ETS)
↓
Index (block metadata + inverted term index + trace index → ETS + disk log)
↓
Compactor (merge raw → compressed OpenZL/zstd every 30s or at threshold)
↓
Retention (delete old/oversized blocks every 5 min)Exporter
The TimelessTraces.Exporter module implements the :otel_exporter_traces behaviour. When the OTel SDK calls export/3, the exporter:
- Reads span records from the SDK's ETS table
- Normalizes each span: converts trace/span IDs from integers to hex strings, extracts attributes, events, links, resource, and instrumentation scope
- Converts timestamps from native units to nanoseconds
- Ingests the normalized spans into the Buffer
This is a zero-copy path -- no HTTP, no protobuf serialization, no external collectors.
Buffer
The Buffer is a GenServer that accumulates spans in memory. It flushes when:
- The buffer reaches
max_buffer_size(default 1000 spans) - The
flush_intervaltimer fires (default 1 second) TimelessTraces.flush()is called manually
Before each flush, the Buffer broadcasts all spans to registered subscribers (live tail). Flush operations are dispatched as async tasks via the FlushSupervisor with backpressure -- at most System.schedulers_online() flushes run concurrently.
Write path
Each flush writes a raw block -- a batch of spans serialized with :erlang.term_to_binary:
- Serialize: Spans are converted to Erlang binary format
- Write: The binary is written to a block file (
blocks/000000000001.raw) - Index: Block metadata, inverted terms, and trace index rows are sent to the Index
The Index processes block metadata with sub-millisecond ETS inserts, journaled to a disk log for durability.
Index
All index state lives in ETS tables — the authoritative source of truth at runtime:
ETS tables
| Table | Type | Purpose |
|---|---|---|
timeless_traces_blocks | ordered_set | Block metadata (block_id → file_path, byte_size, entry_count, ts_min, ts_max, format, created_at) |
timeless_traces_term_index | bag | Term → block_id mapping |
timeless_traces_trace_index | bag | packed trace_id → block_id mapping (with legacy text-row compatibility) |
timeless_traces_compression_stats | set | Compression statistics |
timeless_traces_block_data | set | In-memory block data (memory storage mode only) |
All ETS tables are created with read_concurrency: true and write_concurrency: true.
Persistence
Index durability uses a snapshot + write-ahead log strategy:
index.snapshot: Periodic full dump of all ETS tables (Erlangterm_to_binary, compressed). Written every 1000 index operations or on graceful shutdown.index.log: Erlang:disk_logthat journals every index mutation (block inserts, deletes, compactions). Replayed on startup after loading the snapshot.
On startup: load snapshot → replay log entries newer than the snapshot → index is fully reconstructed in ETS. No external database required.
Inverted term index
When a block is indexed, terms are extracted from each span and stored in the block_terms table. Terms include:
service:<name>-- the service.name attribute or resourcekind:<kind>-- span kind (server, client, etc.)status:<status>-- span status (ok, error, unset)name:<span_name>-- the span operation name
Queries use these terms to identify which blocks contain relevant spans, avoiding full scans.
Trace index
The trace index maps each trace ID to the blocks that contain its spans. This enables fast trace lookup -- TimelessTraces.trace(trace_id) reads only the blocks that contain spans for that trace.
For 32-character hex trace IDs, TimelessTraces stores a packed 16-byte binary form in trace_index instead of the original text. Query lookup accepts the normal hex string and transparently probes both the packed form and legacy text rows, so existing datasets continue to work during migration.
Read path
A query follows this path:
- Term lookup: Use the ETS term index to find block IDs matching the query filters
- Time range: Further narrow blocks by timestamp range using the block metadata
- Parallel read: Read and decompress matching blocks in parallel (
System.schedulers_online()concurrency) - Filter: Apply in-memory filters to individual spans within each block
- Sort & paginate: Sort by start_time (ascending or descending), apply offset and limit
Compaction
The Compactor GenServer periodically checks for raw blocks and compresses them:
- Threshold trigger: Raw entries >=
compaction_threshold(default 500) - Age trigger: Any raw block older than
compaction_max_raw_age(default 60 seconds) - Periodic check: Every
compaction_intervalms (default 30,000)
The compaction process:
- Read all raw block entries from disk
- Merge entries into a single batch
- Compress with the configured format (OpenZL columnar or zstd)
- Write a new compressed block file
- Update the index: delete old block metadata, add new
- Delete old raw block files
- Update compression statistics
Merge compaction
After initial compaction produces many small compressed blocks (e.g. one per flush cycle), the Compactor runs a second pass that merges them into fewer, larger blocks. Larger blocks compress better (bigger dictionary window) and reduce per-block I/O overhead during reads.
- Scan for compressed blocks with
entry_count < merge_compaction_target_size - If enough small blocks exist (
>= merge_compaction_min_blocks), group into batches byts_min - For each batch: decompress all blocks, merge entries sorted by
start_time, recompress - Update the index: remove old block metadata, add new merged block
- Delete old compressed block files
The merge pass runs automatically after every compaction timer tick and can also be triggered manually via TimelessTraces.merge_now().
| Configuration | Default | Description |
|---|---|---|
merge_compaction_target_size | 2000 | Target entries per merged block |
merge_compaction_min_blocks | 4 | Minimum small blocks before merge triggers |
Retention
The Retention GenServer enforces two independent policies every retention_check_interval ms:
- Age-based: Delete blocks with
ts_maxolder thanretention_max_ageseconds ago - Size-based: Delete oldest blocks until total storage is under
retention_max_sizebytes
Disk layout
data_dir/
├── index.snapshot # Periodic ETS table dump (compressed ETF)
├── index.log # Write-ahead log (Erlang disk_log)
└── blocks/
├── 000000000001.raw # Raw block (temporary)
├── 000000000002.raw # Raw block (temporary)
├── 000000000003.ozl # OpenZL compressed block
├── 000000000004.ozl # OpenZL compressed block
└── ...Block filenames are 12-digit zero-padded block IDs with format-specific extensions (.raw, .zst, .ozl).