ADR-0003: Telemetry-First Observability
View SourceStatus
Accepted
Date
2024-12-25
Context
DeepEvalEx evaluates LLM outputs, which involves:
- Multiple API calls to LLM providers (potentially expensive)
- Variable latency per evaluation (seconds to minutes)
- Potential for failures and exceptions
- Need for cost tracking and performance monitoring
Users need visibility into:
- Which metrics are being evaluated
- How long each evaluation takes
- Token usage and estimated costs
- Error rates and failure patterns
Decision
Emit telemetry events at every measurement and evaluation level. No Logger calls inside core metrics/evaluator code.
Events emitted:
# Metric-level events
[:deep_eval_ex, :metric, :start]
[:deep_eval_ex, :metric, :stop]
[:deep_eval_ex, :metric, :exception]
# Batch evaluation events
[:deep_eval_ex, :evaluation, :start]
[:deep_eval_ex, :evaluation, :stop]Event metadata includes:
%{
metric: "Faithfulness",
test_case_id: "uuid",
duration: 1_234_567, # nanoseconds
result: %Result{},
token_usage: %{prompt: 100, completion: 50}
}Consequences
Positive
- Decoupled observability: Handlers are optional and user-defined
- Composable: Works with Logger, StatsD, Prometheus, Phoenix.LiveDashboard
- Non-invasive: Zero overhead when no handlers attached
- Standard: Follows Elixir ecosystem conventions (Phoenix, Ecto, Oban)
- Flexible: Users choose what to log, measure, or alert on
Negative
- Handler setup required: Users must attach handlers to see any output
- More complex: Simple
Logger.infowould be easier for basic debugging - Event discovery: Users must know which events exist and their metadata
Neutral
- Default telemetry handler provided for basic logging (opt-in)
- Metrics are automatically instrumented via BaseMetric macro
- No changes needed to metric implementations for observability
Alternatives Considered
Direct Logger calls
- Rejected: Hardcodes logging decisions. Users can't easily change log levels, formats, or destinations without code changes.
Callback-based hooks
- Rejected: Would require custom callback registration mechanism. Telemetry already provides this with better tooling.
OpenTelemetry integration
- Rejected for now: OpenTelemetry is more complex and not yet standard in Elixir ecosystem. Can be added later via telemetry handlers.