Architecture

This document covers TimelessTraces' internal architecture: the supervision tree, data flow, storage format, and indexing strategy.

Supervision tree

TimelessTraces.Supervisor (:one_for_one)
├── Registry (TimelessTraces.Registry)     # Pub/sub for live tail
├── Index (GenServer)                       # ETS indexing + snapshot/disk log persistence
├── FlushSupervisor (Task.Supervisor)       # Async flush tasks
├── Buffer (GenServer)                      # Span accumulation
├── Compactor (GenServer)                   # Raw → compressed blocks
├── Retention (GenServer)                   # Age/size cleanup
└── HTTP (Bandit, optional)                 # OTLP ingest + Jaeger query

All components start automatically as an OTP application. The HTTP server is only started when http is configured.

Data flow

OTel SDK
    ↓
Exporter (reads spans from OTel ETS table)
    ↓
Buffer (accumulate, auto-flush every 1s or 1000 spans)
    ↓ broadcasts to subscribers
Writer (serialize as raw Erlang terms)
    ↓
Disk (blocks/) or Memory (ETS)
    ↓
Index (block metadata + inverted term index + trace index → ETS + disk log)
    ↓
Compactor (merge raw → compressed OpenZL/zstd every 30s or at threshold)
    ↓
Retention (delete old/oversized blocks every 5 min)

Exporter

The TimelessTraces.Exporter module implements the :otel_exporter_traces behaviour. When the OTel SDK calls export/3, the exporter:

Reads span records from the SDK's ETS table
Normalizes each span: converts trace/span IDs from integers to hex strings, extracts attributes, events, links, resource, and instrumentation scope
Converts timestamps from native units to nanoseconds
Ingests the normalized spans into the Buffer

This is a zero-copy path -- no HTTP, no protobuf serialization, no external collectors.

Buffer

The Buffer is a GenServer that accumulates spans in memory. It flushes when:

The buffer reaches max_buffer_size (default 1000 spans)
The flush_interval timer fires (default 1 second)
TimelessTraces.flush() is called manually

Before each flush, the Buffer broadcasts all spans to registered subscribers (live tail). Flush operations are dispatched as async tasks via the FlushSupervisor with backpressure -- at most System.schedulers_online() flushes run concurrently.

Write path

Each flush writes a raw block -- a batch of spans serialized with :erlang.term_to_binary:

Serialize: Spans are converted to Erlang binary format
Write: The binary is written to a block file (blocks/000000000001.raw)
Index: Block metadata, inverted terms, and trace index rows are sent to the Index

The Index processes block metadata with sub-millisecond ETS inserts, journaled to a disk log for durability.

Index

All index state lives in ETS tables — the authoritative source of truth at runtime:

ETS tables

Table	Type	Purpose
`timeless_traces_blocks`	ordered_set	Block metadata (block_id → file_path, byte_size, entry_count, ts_min, ts_max, format, created_at)
`timeless_traces_term_index`	bag	Term → block_id mapping
`timeless_traces_trace_index`	bag	packed trace_id → block_id mapping (with legacy text-row compatibility)
`timeless_traces_compression_stats`	set	Compression statistics
`timeless_traces_block_data`	set	In-memory block data (memory storage mode only)

All ETS tables are created with read_concurrency: true and write_concurrency: true.

Persistence

Index durability uses a snapshot + write-ahead log strategy:

index.snapshot: Periodic full dump of all ETS tables (Erlang term_to_binary, compressed). Written every 1000 index operations or on graceful shutdown.
index.log: Erlang :disk_log that journals every index mutation (block inserts, deletes, compactions). Replayed on startup after loading the snapshot.

On startup: load snapshot → replay log entries newer than the snapshot → index is fully reconstructed in ETS. No external database required.

Inverted term index

When a block is indexed, terms are extracted from each span and stored in the block_terms table. Terms include:

service:<name> -- the service.name attribute or resource
kind:<kind> -- span kind (server, client, etc.)
status:<status> -- span status (ok, error, unset)
name:<span_name> -- the span operation name

Queries use these terms to identify which blocks contain relevant spans, avoiding full scans.

Trace index

The trace index maps each trace ID to the blocks that contain its spans. This enables fast trace lookup -- TimelessTraces.trace(trace_id) reads only the blocks that contain spans for that trace.

For 32-character hex trace IDs, TimelessTraces stores a packed 16-byte binary form in trace_index instead of the original text. Query lookup accepts the normal hex string and transparently probes both the packed form and legacy text rows, so existing datasets continue to work during migration.

Read path

A query follows this path:

Term lookup: Use the ETS term index to find block IDs matching the query filters
Time range: Further narrow blocks by timestamp range using the block metadata
Parallel read: Read and decompress matching blocks in parallel (System.schedulers_online() concurrency)
Filter: Apply in-memory filters to individual spans within each block
Sort & paginate: Sort by start_time (ascending or descending), apply offset and limit

Compaction

The Compactor GenServer periodically checks for raw blocks and compresses them:

Threshold trigger: Raw entries >= compaction_threshold (default 500)
Age trigger: Any raw block older than compaction_max_raw_age (default 60 seconds)
Periodic check: Every compaction_interval ms (default 30,000)

The compaction process:

Read all raw block entries from disk
Merge entries into a single batch
Compress with the configured format (OpenZL columnar or zstd)
Write a new compressed block file
Update the index: delete old block metadata, add new
Delete old raw block files
Update compression statistics

Merge compaction

After initial compaction produces many small compressed blocks (e.g. one per flush cycle), the Compactor runs a second pass that merges them into fewer, larger blocks. Larger blocks compress better (bigger dictionary window) and reduce per-block I/O overhead during reads.

Scan for compressed blocks with entry_count < merge_compaction_target_size
If enough small blocks exist (>= merge_compaction_min_blocks), group into batches by ts_min
For each batch: decompress all blocks, merge entries sorted by start_time, recompress
Update the index: remove old block metadata, add new merged block
Delete old compressed block files

The merge pass runs automatically after every compaction timer tick and can also be triggered manually via TimelessTraces.merge_now().

Configuration	Default	Description
`merge_compaction_target_size`	2000	Target entries per merged block
`merge_compaction_min_blocks`	4	Minimum small blocks before merge triggers

Retention

The Retention GenServer enforces two independent policies every retention_check_interval ms:

Age-based: Delete blocks with ts_max older than retention_max_age seconds ago
Size-based: Delete oldest blocks until total storage is under retention_max_size bytes

Disk layout

data_dir/
├── index.snapshot    # Periodic ETS table dump (compressed ETF)
├── index.log         # Write-ahead log (Erlang disk_log)
└── blocks/
    ├── 000000000001.raw   # Raw block (temporary)
    ├── 000000000002.raw   # Raw block (temporary)
    ├── 000000000003.ozl   # OpenZL compressed block
    ├── 000000000004.ozl   # OpenZL compressed block
    └── ...

Block filenames are 12-digit zero-padded block IDs with format-specific extensions (.raw, .zst, .ozl).

← Previous Page LICENSE

Next Page → Configuration Reference