Design and Internals
View SourceThis guide describes the architecture, design decisions, and internal workings of the instrument library for developers who want to understand or extend the system.
Architecture Overview
+------------------+ +------------------+ +------------------+
| Application | | Tracing | | Metrics |
|------------------| |------------------| |------------------|
| instrument_app |---->| instrument_tracer| | instrument_counter|
| instrument_sup | | instrument_sampler | instrument_gauge |
| instrument_config| | instrument_span_*| | instrument_hist...|
+------------------+ +------------------+ +------------------+
| | |
v v v
+------------------+ +------------------+ +------------------+
| Context | | Processing | | Registry |
|------------------| |------------------| |------------------|
| instrument_ctx |<--->| span_processor | | instrument_reg...|
| instrument_prop.. | instrument_exp...|<--->| instrument_lib |
+------------------+ +------------------+ +------------------+
|
v
+------------------+
| Flight Recorder |
|------------------|
| flight_recorder |
| tracer_pool |
| tracer_nif (C) |
+------------------+Startup Sequence
- instrument_app:start/2 - Initializes configuration via
instrument_config:init() - Creates the
instrument_span_exportersETS table for span export hooks - instrument_sup:start_link/0 - Starts the supervisor tree
The supervisor (instrument_sup) starts these children with one_for_all strategy:
| Order | Child ID | Module | Purpose |
|---|---|---|---|
| 1 | registry | instrument_registry | Metric registration and lookup |
| 2 | exporter | instrument_exporter | Span export batching |
| 3 | metrics_exporter | instrument_metrics_exporter | Metrics export |
| 4 | log_exporter | instrument_log_exporter | Log export |
| 5 | span_processor | instrument_span_processor | Span processing chain |
| 6 | flight_recorder | instrument_flight_recorder | Message tracing |
Design Principles
Lock-free Operations: Metrics use NIF-based C11 atomics for updates, avoiding locks entirely.
Scheduler-aware Storage: ETS tables are partitioned per-scheduler (instrument_registry_1 through instrument_registry_N) to eliminate cross-scheduler contention.
Persistent Term Caching: Metric lookups use persistent_term for O(1) read access with zero copying overhead.
Process Dictionary Context: Span context is stored in the process dictionary for zero-cost reads within a process.
Module Organization
By Subsystem
| Subsystem | Modules | Purpose |
|---|---|---|
| Application | instrument_app, instrument_sup, instrument_config | Lifecycle and configuration |
| Tracing | instrument_tracer, instrument_sampler, instrument_id | Span creation and sampling |
| Metrics | instrument_counter, instrument_gauge, instrument_histogram | Metric types |
| Context | instrument_context, instrument_propagator, instrument_propagator_* | Context propagation |
| Processing | instrument_span_processor, instrument_span_processor_*, instrument_exporter | Span processing pipeline |
| Flight Recorder | instrument_flight_recorder, instrument_tracer_pool, instrument_tracer_nif | Message capture |
| Registry | instrument_registry, instrument_lib | Metric storage |
Module Dependencies
instrument_tracer
|
+---> instrument_context (span storage)
+---> instrument_sampler (sampling decisions)
+---> instrument_id (ID generation)
+---> instrument_span_processor (on_start/on_end hooks)
+---> instrument_flight_recorder (message tracing)Key Modules
instrument_tracer: Core span lifecycle management. Handles span creation, context attachment, and export hooks. Uses the process dictionary via instrument_context to maintain the current span stack.
instrument_registry: Central metric storage using a gen_server with scheduler-partitioned ETS tables and persistent_term caching. The gen_server serializes writes while reads bypass it entirely.
instrument_span_processor: Manages a chain of span processors, invoking on_start/2 and on_end/1 callbacks. Supports both simple (immediate) and batch processing.
instrument_exporter: Batches completed spans and distributes them to registered exporters. Default batch size is 512 spans with a 5-second timeout.
instrument_flight_recorder: Low-overhead message tracing using erlang:trace with a custom erl_tracer NIF. Distributes trace events across a worker pool to avoid single-process bottlenecks.
Key Data Structures
Tracing Records (instrument_otel.hrl)
#span_ctx{} - W3C TraceContext identifier
-record(span_ctx, {
trace_id :: <<_:128>>, % 16 bytes (128 bits)
span_id :: <<_:64>>, % 8 bytes (64 bits)
trace_flags = 1 :: 0 | 1, % sampled flag
trace_state = [] :: [{binary(), binary()}], % vendor key-value pairs
is_remote = false :: boolean() % extracted from remote?
}).#span{} - Complete span record
-record(span, {
name :: binary(),
ctx :: #span_ctx{},
parent_ctx :: #span_ctx{} | undefined,
kind = internal :: client | server | producer | consumer | internal,
start_time :: integer(), % monotonic nanoseconds
end_time :: integer() | undefined,
attributes = #{} :: map(),
events = [] :: [#span_event{}],
links = [] :: [#span_link{}],
status = unset :: unset | ok | {error, binary()},
is_recording = true :: boolean()
}).#span_event{} - Timestamped annotation
-record(span_event, {
name :: binary(),
timestamp :: integer(), % monotonic nanoseconds
attributes = #{} :: map()
}).#sampling_result{} - Sampler output
-record(sampling_result, {
decision :: drop | record_only | record_and_sample,
attributes = #{} :: map(),
trace_state = [] :: [{binary(), binary()}]
}).Metrics Records (instrument.hrl)
#metric{} - Core metric wrapper
-record(metric, {
name,
handle :: term(), % NIF resource or #vector{}
collect :: tuple(), % {Module, Function, Args}
description :: binary() | undefined,
unit :: binary() | undefined,
meter :: binary() | undefined,
attributes = #{} :: map()
}).#vector{} - Labeled/dimensional metrics
-record(vector, {
name,
help,
metric, % counter | gauge | histogram
buckets = [], % histogram boundaries
labels = [], % label keys
labels_map = #{} % {LabelValues} => #metric{}
}).Internal Workflows
Span Lifecycle
start_span(Name, Opts)
|
v
[Check tracing enabled] --no--> [Create noop span] --> [Attach] --> Return
|
yes
v
[Get parent context from opts or current span]
|
v
[Generate trace_id (new trace) or inherit from parent]
[Generate span_id]
|
v
[Call sampler: should_sample/6]
|
+---> decision = drop --> is_recording = false, trace_flags = 0
+---> decision = record_only --> is_recording = true, trace_flags = 0
+---> decision = record_and_sample --> is_recording = true, trace_flags = 1
|
v
[Create #span{} with merged attributes]
|
v
[If is_recording: call span_processor:on_start/2]
|
v
[Attach span to context (process dictionary)]
|
v
[If flight recorder enabled: enable_flight_tracing/1]
|
v
Return #span{}Span End Flow
end_span(Span)
|
v
[If is_recording = false] --> [Detach from context] --> Return ok
|
v
[Set end_time = now()]
|
v
[Call span_processor:on_end/1]
|
v
[Mark is_recording = false]
|
v
[Call all registered exporters via ETS lookup]
|
v
[If flight recorder owner: disable tracing]
|
v
[Detach span from context, restore parent if any]
|
v
Return okMetrics Recording Flow
instrument_counter:inc(Name, Value)
|
v
[Lookup metric via persistent_term]
|
+---> #metric{handle = NIF_Resource}
| |
| v
| [Call NIF: atomic increment]
|
+---> #metric{handle = #vector{}}
|
v
[Lookup labeled metric from labels_map]
|
v
[Call NIF on resolved metric]Context Propagation Flow
Outbound (inject):
instrument_propagator:inject(Carrier)
|
v
[Get current context from process dictionary]
|
v
[For each registered propagator:]
+---> propagator:inject(Ctx, Carrier) --> Updated Carrier
|
v
Return final Carrier with all headers
Inbound (extract):
instrument_propagator:extract(Carrier)
|
v
[Create new empty context]
|
v
[For each registered propagator:]
+---> propagator:extract(Carrier, Ctx) --> Updated Ctx
|
v
Return context with span_ctx, baggage, etc.Flight Recorder Message Flow
[Root span starts]
|
v
[enable_flight_tracing(TraceId)]
|
v
[Store label in process dictionary]
|
v
[Get tracer state from instrument_tracer_pool]
|
v
[erlang:trace(self(), true, [send, 'receive', set_on_spawn,
{tracer, instrument_tracer_nif, State}])]
|
v
[All send/receive events captured by NIF]
|
+---> [NIF hashes tracee PID to select worker]
| |
| v
| [Send {trace, ...} to worker]
| |
| v
| [Worker inserts into ETS ring buffer]
|
v
[On span end: erlang:trace(self(), false, ...)]Extension Points
Custom Sampler
Implement the instrument_sampler behavior:
-module(my_sampler).
-behaviour(instrument_sampler).
-export([should_sample/7, get_description/1]).
-include_lib("instrument/include/instrument_otel.hrl").
%% Rate limit to N spans per second
should_sample(Config, TraceId, SpanName, SpanKind, Attributes, Links, ParentCtx) ->
RateLimit = maps:get(rate_limit, Config, 100),
case check_rate_limit(RateLimit) of
allow ->
#sampling_result{
decision = record_and_sample,
attributes = #{},
trace_state = []
};
deny ->
#sampling_result{decision = drop}
end.
get_description(Config) ->
RateLimit = maps:get(rate_limit, Config, 100),
iolist_to_binary(io_lib:format("RateLimitSampler{~p/s}", [RateLimit])).
%% Register:
%% instrument_sampler:set_sampler(my_sampler, #{rate_limit => 50}).Custom Span Processor
Implement the instrument_span_processor behavior:
-module(my_span_processor).
-behaviour(instrument_span_processor).
-export([init/1, on_start/2, on_end/1, shutdown/1, force_flush/1]).
-include_lib("instrument/include/instrument_otel.hrl").
init(Config) ->
%% Initialize state (e.g., open connections)
{ok, #{filter => maps:get(filter, Config, fun(_) -> true end)}}.
on_start(Span, _ParentCtx) ->
%% Called synchronously at span start
%% Add attributes, modify span, etc.
Span#span{attributes = maps:put(<<"processor">>, <<"custom">>,
Span#span.attributes)}.
on_end(Span) ->
%% Called asynchronously at span end
%% Filter, transform, forward to external systems
case should_export(Span) of
true -> send_to_analytics(Span);
false -> ok
end.
shutdown(_State) ->
ok.
force_flush(_State) ->
ok.
%% Register:
%% instrument_span_processor:register(my_span_processor, #{}).Custom Exporter
Implement the exporter callbacks:
-module(my_exporter).
-export([init/1, export/2, shutdown/1, force_flush/1]).
-include_lib("instrument/include/instrument_otel.hrl").
init(Config) ->
Endpoint = maps:get(endpoint, Config),
{ok, #{endpoint => Endpoint, conn => connect(Endpoint)}}.
export(Spans, State) ->
%% Spans is a list of #span{} records
#{endpoint := Endpoint, conn := Conn} = State,
Payload = encode_spans(Spans),
case send_request(Conn, Endpoint, Payload) of
ok ->
{ok, State};
{error, Reason} ->
logger:warning("Export failed: ~p", [Reason]),
{error, Reason, State}
end.
shutdown(#{conn := Conn}) ->
close(Conn),
ok.
force_flush(State) ->
{ok, State}.
%% Register:
%% instrument_exporter:register(#{module => my_exporter,
%% config => #{endpoint => "..."}}).Custom Propagator
Implement the instrument_propagator behavior:
-module(my_propagator).
-behaviour(instrument_propagator).
-export([inject/2, extract/2, fields/0]).
-define(HEADER, <<"x-my-trace">>).
inject(Ctx, Carrier) ->
case maps:get(span_ctx, Ctx, undefined) of
undefined ->
Carrier;
#span_ctx{trace_id = TraceId, span_id = SpanId} ->
Value = encode(TraceId, SpanId),
maps:put(?HEADER, Value, Carrier)
end.
extract(Carrier, Ctx) ->
case maps:get(?HEADER, Carrier, undefined) of
undefined ->
Ctx;
Value ->
{TraceId, SpanId} = decode(Value),
SpanCtx = #span_ctx{
trace_id = TraceId,
span_id = SpanId,
is_remote = true
},
maps:put(span_ctx, SpanCtx, Ctx)
end.
fields() ->
[?HEADER].
%% Register:
%% instrument_propagator:register(my_propagator).Performance Design Choices
NIF-based Atomic Metrics
Metrics use C11 atomics via NIFs for lock-free updates:
// From c_src/gauge.c
static void instrument_gauge_change(instrument_gauge_t *g, double delta) {
double current = atomic_load(&g->value);
while (!atomic_compare_exchange_weak(&g->value, ¤t, current + delta))
;
}This provides:
- Lock-free updates using CAS (compare-and-swap)
- No contention between concurrent writers
- Sub-microsecond update latency
Scheduler-aware ETS Tables
The registry creates one ETS table per scheduler:
% From instrument_lib.erl
tables() ->
[table(S) || S <- lists:seq(1,erlang:system_info(schedulers))].
table() ->
table(erlang:system_info(scheduler_id)).
table(1) -> instrument_registry_1;
table(2) -> instrument_registry_2;
% ... up to instrument_registry_64Benefits:
- Each scheduler accesses its own table
- Eliminates cross-scheduler ETS lock contention
- Linear scaling with core count
Persistent Term Caching
Metric lookups use persistent_term for O(1) access:
% From instrument_registry.erl
lookup(Name) ->
persistent_term:get({instrument_metric, Name}, undefined).
cache_label(Name, LabelValues, Metric) ->
persistent_term:put({instrument_label, Name, LabelValues}, Metric).Characteristics:
- Reads are essentially free (no copying)
- Writes trigger global GC but are rare
- Perfect for read-heavy metric lookups
Process Dictionary Context
Context is stored in the process dictionary:
% From instrument_context.erl
current() ->
case erlang:get(?CONTEXT_KEY) of
undefined -> new();
Ctx -> Ctx
end.
set_current(Ctx) when is_map(Ctx) ->
erlang:put(?CONTEXT_KEY, Ctx),
ok.Benefits:
- Zero-cost reads (direct memory access)
- No message passing or function calls
- Natural process isolation
Worker Pool Distribution
The flight recorder distributes trace events across workers:
// From c_src/instrument_tracer_nif.c
ErlNifUInt64 hash = enif_hash(ERL_NIF_INTERNAL_HASH, tracee, 0);
unsigned int worker_idx = hash % pool_size;This avoids the single-process bottleneck that would occur if all trace events went to one handler.
Batched Export
Spans are batched before export:
| Setting | Default | Purpose |
|---|---|---|
| max_export_batch_size | 512 | Spans per batch |
| schedule_delay_millis | 5000 | Time trigger |
| max_queue_size | 2048 | Queue limit |
| export_timeout_millis | 30000 | Export timeout |
Performance Summary
| Operation | Mechanism | Typical Latency | Concurrency |
|---|---|---|---|
| Metric increment | NIF atomic CAS | <100ns | Lock-free |
| Metric lookup | persistent_term | <10ns | Copy-free |
| Context read | Process dictionary | <10ns | Process-local |
| Span start | Erlang + sampling | ~1-5us | Per-process |
| Span end | Async cast | <1us | Non-blocking |
| Flight trace event | NIF + hash | ~500ns | Pool-distributed |
| Batch export | Async process | Background | Configurable |