Why Observability Matters
View SourceBefore writing code, let's understand what observability means and when you need it.
The Problem
Your Erlang application is running in production. Users report slow responses. Where do you look?
- Is the database slow?
- Is a specific endpoint causing issues?
- Are certain users affected more than others?
- Did the problem start after a recent deployment?
Without observability, you are flying blind.
The Three Pillars
Observability rests on three types of telemetry:
Metrics
Metrics are numeric measurements over time. They answer "how much" and "how many" questions:
- How many requests per second?
- What is the average response time?
- How many active connections?
- What percentage of requests fail?
Metrics are lightweight. You can collect thousands of metrics with minimal overhead.
Traces
Traces follow a request through your system. They answer "what happened" questions:
- Which services did this request touch?
- Where did it spend the most time?
- What data did it process?
- Where did it fail?
Each trace contains spans representing units of work. Spans form a tree showing the request's path.
Logs
Logs are timestamped records of events. They provide detailed context:
- What values did the function receive?
- What error message was returned?
- What decisions did the code make?
Logs become powerful when correlated with traces, letting you find the exact log lines for a problematic request.
When to Use Each
Use metrics when:
- You need aggregated data (averages, percentiles, counts)
- You want to set up alerting thresholds
- You care about system-wide behavior
- You need low-overhead collection
Use traces when:
- You need to understand request flow
- You are debugging latency issues
- You have multiple services communicating
- You need to see the full picture of one request
Use logs when:
- You need detailed context about specific events
- You are debugging business logic
- You need human-readable records
- You want to capture unexpected conditions
Why instrument?
The instrument library gives you all three pillars in one package:
%% Metrics
Counter = instrument_metric:new_counter(requests_total, <<"Total requests">>),
instrument_metric:inc_counter(Counter).
%% Traces
instrument_tracer:with_span(<<"handle_request">>, fun() ->
instrument_tracer:set_attribute(<<"user.id">>, UserId),
process_request()
end).
%% Logs (with trace correlation)
instrument_logger:install(),
logger:info("Processing user ~s", [UserId]). %% Includes trace_idThe Cost of Not Observing
Without observability:
- Debugging takes hours instead of minutes
- You can't prove whether fixes work
- You react to problems instead of preventing them
- You can't understand your system's behavior
What You Will Build
By the end of this book, you will have instrumented an Erlang application with:
- Request counters and latency histograms
- Distributed traces across services
- Correlated logs
- Export to Prometheus and Jaeger
Let's start by creating your first metrics.