Accrue emits :telemetry events for every public entry point. This guide
documents the conventions, the high-signal [:accrue, :ops, :*] namespace,
the OpenTelemetry span naming rules, and the default
Telemetry.Metrics recipe.
If you only read one section: jump to Using the default metrics recipe
below — that's the short host wiring snippet that appends Accrue.Telemetry.Metrics.defaults/0
(~20 metric definitions, including every documented ops counter) with minimal glue code.
Doc contract: Published Hex docs for accrue describe the telemetry
contracts for that package version. The main branch may add or rename
events before they appear in a Hex release; operators should prefer the
guides/telemetry.md revision that matches the accrue version pinned in
their host mix.lock (see Hex for the published
guide snapshot).
Namespace split
Accrue divides telemetry into two namespaces:
[:accrue, :*]— the firehose. Every public Accrue entry point emits:start/:stop/:exceptionhere viaAccrue.Telemetry.span/3. High cardinality. Use for tracing, debug, and traffic shape — NOT for paging on-call.[:accrue, :ops, :*]— SRE-actionable signals. Low cardinality, high value. Every event in this namespace is an alertable condition that should wake somebody up (or at least file a ticket). Subscribe handlers here, set thresholds, sleep well.
The split exists because Accrue's firehose is too noisy for alerting — a
busy SaaS dispatching webhooks, finalizing invoices, and reporting usage
events emits hundreds of [:accrue, :*] events per second. The ops namespace
is curated: every event is a real ops signal, not a heartbeat.
Firehose and diagnostic events (not in [:accrue, :ops, :*])
These are useful for tracing, dashboards, and anomaly detection, not typically for paging:
- Billing spans —
Accrue.Telemetry.span/3on every publicAccrue.Billingentry point:[:accrue, :billing, :<resource>, :<action>, :start | :stop | :exception](seeAccrue.Telemetrymodule doc). - Webhooks —
[:accrue, :webhook, :receive], handler exceptions, orphan reducers under[:accrue, :webhooks, :orphan_*],[:accrue, :webhooks, :stale_event], etc. - Mail / PDF —
[:accrue, :mailer, :deliver, …],[:accrue, :pdf, :render, …], email fallbacks[:accrue, :email, :locale_fallback | :timezone_fallback | :format_money_failed].
Subscribe in your host Telemetry or OpenTelemetry pipeline when you need
latency percentiles, diagnostics, or high-cardinality dashboards — keep
on-call paging on [:accrue, :ops, :*] above. The firehose stays useful
for tracing and anomaly detection, but its volume is intentionally unsuitable
for wake-the-oncall thresholds.
Ops event catalog ([:accrue, :ops, :*])
All ops events fire inside the same Repo.transact/2 as the state write
they correspond to — they are idempotent under webhook replay via the
accrue_webhook_events unique-index short-circuit.
| Event | Measurements | Metadata | Primary owner |
|---|---|---|---|
[:accrue, :ops, :revenue_loss] | count, amount_minor, currency | subject_type, subject_id, reason | Accrue.Telemetry.Ops |
[:accrue, :ops, :dunning_exhaustion] | count | subscription_id, from_status, to_status, source (:accrue_sweeper | :stripe_native | :manual) | Accrue.Webhook.DefaultHandler |
[:accrue, :ops, :incomplete_expired] | count | subscription_id | Accrue.Telemetry.Ops |
[:accrue, :ops, :charge_failed] | count | charge_id, customer_id, failure_code | Accrue.Telemetry.Ops |
[:accrue, :ops, :meter_reporting_failed] | count | meter_event_id, event_name, source (:reconciler | :webhook | :sync) | Accrue.Webhook.DefaultHandler / Accrue.Billing.MeterEventActions / Accrue.Jobs.MeterEventsReconciler |
[:accrue, :ops, :webhook_dlq, :dead_lettered] | count | event_id, processor_event_id, type, attempt | Accrue.Webhook.DispatchWorker |
[:accrue, :ops, :webhook_dlq, :replay] | count, duration, requeued_count, skipped_count | actor, filter, dry_run? | Accrue.Webhooks.DLQ |
[:accrue, :ops, :webhook_dlq, :prune] | dead_deleted, succeeded_deleted, duration | retention_days | Accrue.Webhook.Pruner |
[:accrue, :ops, :pdf_adapter_unavailable] | count | type (email template key), operation_id when set | Accrue.Workers.Mailer |
[:accrue, :ops, :events_upcast_failed] | count | event_id, type, schema_version | Accrue.Events |
[:accrue, :ops, :connect_account_deauthorized] | count | stripe_account_id, deauthorized_at or unresolved: true | Accrue.Webhook.ConnectHandler |
[:accrue, :ops, :connect_capability_lost] | count | stripe_account_id, capability, from, to | Accrue.Webhook.ConnectHandler |
[:accrue, :ops, :connect_payout_failed] | count | stripe_account_id, payout_id, amount, currency, failure_code | Accrue.Webhook.ConnectHandler |
Meter reporting failures: semantics & sources
[:accrue, :ops, :meter_reporting_failed] fires on the first durable transition of a meter row into terminal failed (as enforced by Accrue.Billing.MeterEvents guarded updates)—not once per HTTP retry, Stripe redelivery, webhook replay, or idempotent report_usage replay. Treat it as a terminal-epoch signal, not retry noise.
:sync— immediate host call path: failures originate inAccrue.Billing.MeterEventActionswhile handlingAccrue.Billing.report_usage/3inside the same transaction that attempted the processor report.:reconciler— background reconciliation:Accrue.Jobs.MeterEventsReconcileron queue:accrue_metersretries stuckpendingrows and emits this ops tuple when a durablefailedtransition is recorded.:webhook— Stripe-reported meter errors:Accrue.Webhook.DefaultHandleringests the billing path andAccrue.Webhook.DispatchWorkercarries the async context when the handler marks the rowfailedwith telemetry.
Read this block before tuning Grafana annotations—alert links should point here (tuple + semantics), then operator-runbooks.md for ordered triage.
Connect ops rows above are emitted via Accrue.Telemetry.Ops.emit/3 from
Accrue.Webhook.ConnectHandler. PDF and ledger rows use :telemetry.execute/3
directly with the same [:accrue, :ops] prefix — treat them as first-class
ops signals for paging and dashboards.
Note: [:accrue, :ops, :revenue_loss], :incomplete_expired, and
:charge_failed are part of the supported host + Accrue ops vocabulary
(Ops.emit/3 and metrics defaults). Prefer Ops.emit/3 from host billing
code so operation_id merges consistently; search the codebase for concrete
emit sites when wiring alerts.
Every ops event also carries an automatically-merged operation_id field
in metadata, sourced from Accrue.Actor.current_operation_id/0 (the same
seed used for processor idempotency keys). This lets you correlate
ops events with the originating webhook, Oban job, or admin action across
service boundaries.
Last reconciled with v1.9 gap audit §1: 2026-04-21 — PR #14.
The §1 ops inventory in .planning/research/v1.9-TELEMETRY-GAP-AUDIT.md is
reflected in the ops catalog table above (including Connect, PDF, ledger,
and DLQ rows).
Span naming conventions (OpenTelemetry)
Accrue's OpenTelemetry span helpers (gated on the :opentelemetry optional
dep) wrap every Billing context function with consistent naming:
accrue.<domain>.<resource>.<action>Domains emitted through Accrue.Telemetry.span/3 today are :billing,
:connect, :mailer, :pdf, :processor, and :storage (see
Accrue.Telemetry module doc). Illustrative, non-exhaustive billing
examples below — for the enforced billing span inventory, see
test/accrue/telemetry/billing_span_coverage_test.exs.
accrue.billing.subscription.createaccrue.billing.subscription.cancelaccrue.billing.invoice.finalizeaccrue.billing.charge.refundaccrue.billing.meter_event.report_usage— verified billing span fromAccrue.Billing.report_usage/3; failures surface as the ops signal[:accrue, :ops, :meter_reporting_failed](see ops catalog table).accrue.billing.payment_method.list—[:accrue, :billing, :payment_method, :list]fromAccrue.Billing.list_payment_methods/2(processor-backed read; no extra ops tuple).- <a id="billing-billing-portal-create"></a>
accrue.billing.billing_portal.create—[:accrue, :billing, :billing_portal, :create]fromAccrue.Billing.create_billing_portal_session/2(Phase 78 / BIL-04). - <a id="billing-checkout-session-create"></a>
accrue.billing.checkout_session.create—[:accrue, :billing, :checkout_session, :create]fromAccrue.Billing.create_checkout_session/2(BIL-06 / Phase 80 emission; BIL-07 / Phase 81 catalog). Checkout-only span metadata merged from validated attrs is exactlycheckout_mode,checkout_ui_mode, andcheckout_line_items_count— no checkout URLs,client_secret, or raw attrs blob (behavioral SSOT:checkout_session_facade_test.exs+merge_checkout_session_create_metadata/4). Prefer these on spans; do not promote them unmodified toTelemetry.Metricstags (see Cardinality discipline below). accrue.billing.payment_method.attach—[:accrue, :billing, :payment_method, :attach]fromAccrue.Billing.attach_payment_method/3.accrue.billing.payment_method.detach—[:accrue, :billing, :payment_method, :detach]fromAccrue.Billing.detach_payment_method/2.accrue.billing.payment_method.set_default—[:accrue, :billing, :payment_method, :set_default]fromAccrue.Billing.set_default_payment_method/3.NOT an OTel span name—accrue.webhooks.dlq.replayis the dotted OpenTelemetry span name only when a host maps the ops event[:accrue, :ops, :webhook_dlq, :replay](viaOps.emit/3/:telemetry.execute) into OTel separately. It is not produced byAccrue.Telemetry.span/3.
This mirrors the :telemetry event naming
([:accrue, :billing, :subscription, :create]) so a single name maps cleanly
to both the telemetry event and the OTel span — no translation table.
Last reconciled (billing span examples): 2026-04-24 — Phase 81 BIL-07
(checkout_session catalog row; billing_span_coverage_test.exs unchanged).
Span kind:
INTERNALfor Accrue context functionsCLIENTfor outboundlattice_stripecalls (the underlying HTTP layer emits these via its own instrumentation)
Attribute conventions
Accrue spans attach a small, fixed set of business-meaningful attributes. Hosts adding their own spans on top of Accrue should follow the same discipline — both for grep-ability across services and for the PII contract below.
Allowed attributes (host-queryable, audit-useful, PII-free):
accrue.subscription.id— Accrue's internal UUIDaccrue.customer.id— Accrue's internal UUIDaccrue.invoice.idaccrue.charge.idaccrue.event_type— webhook event type string (e.g."invoice.paid")accrue.processor—:stripe|:fakestripe.subscription.id— upstream Stripe ID (for support bridging)stripe.customer.idstripe.charge.idstripe.invoice.idstripe.payment_intent.id
PROHIBITED attributes (NEVER attach to spans, telemetry events, or metric tags):
- Any customer email, name, phone, postal address
- Any card PAN, CVC, expiry, fingerprint metadata
- Any Stripe
PaymentMethodraw response data - Any webhook raw body or signature
- Any
Accrue.Moneyamounts that identify a specific customer's purchase history (aggregate amounts at the metric level are fine; per-customer amounts at the span/event level are not) - Any free-text reason fields supplied by end users (refund notes, support tickets, etc.)
The rule of thumb: attach PII-free identifiers ONLY. If an attribute could be reversed into a person, don't attach it. This is a host responsibility too — Accrue cannot inspect your custom span attributes for PII at runtime, so code review and the grep-able allowlist above are the enforcement mechanism.
Using the default metrics recipe
Accrue.Telemetry.Metrics.defaults/0 returns a list of ready-to-use
Telemetry.Metrics definitions covering the billing context, webhook
pipeline, and ops namespace. It is conditionally compiled on the optional
:telemetry_metrics dep.
Add the dep to your host app:
# mix.exs
{:telemetry_metrics, "~> 1.1"},
{:telemetry_metrics_prometheus, "~> 1.1"} # or your reporter of choiceThen wire it in:
defmodule MyApp.Telemetry do
use Supervisor
import Telemetry.Metrics
def start_link(arg), do: Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
@impl true
def init(_arg) do
children = [
{TelemetryMetricsPrometheus, [metrics: metrics()]}
]
Supervisor.init(children, strategy: :one_for_one)
end
defp metrics do
[
counter("my_app.request.count")
# ... your other host metrics ...
] ++ Accrue.Telemetry.Metrics.defaults()
end
endThis wires in the default metric set covering the billing context, webhook pipeline, and full ops namespace (including Connect and PDF ops signals). Distributions and percentile summaries beyond these are host choice — Accrue doesn't prescribe binning strategies because appropriate buckets depend heavily on your traffic shape and SLO targets.
The default recipe includes counters for every [:accrue, :ops, :*] event
documented in the table above (including Connect, PDF fallback, and ledger
upcast failures), so Prometheus-style scrapers stay aligned with the ops
catalog.
Cardinality discipline
The default metric definitions only attach low-cardinality tags (:status,
:source, :type, :stripe_status). Customer IDs, subscription IDs, and
other unbounded identifiers are never promoted to metric tags — they
belong on spans, not metrics. If you add custom Accrue-derived metrics in
your host app, follow the same rule: anything with more than ~50 distinct
values per day is a span attribute, not a metric tag.
Emitting custom ops events
If your host app fires billing-adjacent ops events (e.g. a custom revenue
recognition reconciler), prefer Accrue.Telemetry.Ops.emit/3 over raw
:telemetry.execute/3 — it enforces the namespace prefix and auto-merges
operation_id from the process dict:
Accrue.Telemetry.Ops.emit(
:revenue_loss,
%{count: 1, amount_minor: 9900, currency: "usd"},
%{subject_type: "Subscription", subject_id: sub.id, reason: :fraud_refund}
)
# → emits [:accrue, :ops, :revenue_loss] with operation_id auto-mergedFor multi-segment events (sub-namespaces), pass a list:
Accrue.Telemetry.Ops.emit(
[:webhook_dlq, :replay],
%{count: 12, requeued_count: 12, skipped_count: 0, duration: 142_000},
%{actor: :admin, filter: %{type: "invoice.paid"}, dry_run?: false}
)
# → emits [:accrue, :ops, :webhook_dlq, :replay]The [:accrue, :ops] prefix is hardcoded — callers cannot inject events
outside the namespace via this helper. If you need to emit
under [:accrue, :*] for the firehose, use Accrue.Telemetry.span/3
instead.
Cross-domain host subscription
Phoenix controllers, LiveViews, Channels, and plain processes often need a
small, high-signal Accrue hook without subscribing to the full billing
firehose. Use the public modules only — Accrue.Telemetry (span naming),
Accrue.Telemetry.Metrics (default counters), and Accrue.Telemetry.Ops
(emit/3 contract) — and the same :telemetry APIs you already use elsewhere.
Append Accrue.Telemetry.Metrics.defaults/0 to your host metric list (see
Using the default metrics recipe above) so scrapers stay aligned with the
ops catalog. The authoritative tuple list lives in the Ops event catalog
table earlier in this guide — do not fork that table into a second
inventory here.
Ops attach (webhook DLQ dead-lettered)
This pattern mirrors the checked-in examples/accrue_host app: start a tiny
process once from supervision, call :telemetry.attach/4 with a stable
handler id, and detach on shutdown so dev hot reload does not stack duplicate
handlers.
:telemetry.attach(
"accrue-host-ops-dlq-dead-lettered",
[:accrue, :ops, :webhook_dlq, :dead_lettered],
fn _event, measurements, _metadata, _config ->
require Logger
# Low-cardinality only — never log full metadata maps from billing.
Logger.info("accrue ops webhook_dlq dead_lettered count=#{measurements.count}")
end,
nil
)Optional: billing spans without metric-tag explosions
Accrue.Telemetry.span/3 on billing entry points emits
[:accrue, :billing, :*, :start | :stop | :exception] — high cardinality.
If you add a handler, subscribe to :stop and :exception only for
dashboards or logs. Do not copy customer IDs, subscription IDs, or other
unbounded fields into metric tags; identifiers belong in traces or
scrubbed log lines, not Prometheus labels.
OpenTelemetry–first hosts
Teams standardizing on OpenTelemetry may skip Telemetry.Metrics
entirely and attach handlers or OTel bridges directly to :telemetry events
instead — the ops catalog tuples still apply. Prefer spans for per-customer
detail; keep paging on [:accrue, :ops, :*].
For ordered triage, default Oban queue placement (anchor #oban-queue-topology in operator-runbooks.md), and expanded Stripe verification, use Operator runbooks. This section keeps a compact signal → first-action table as your starting point — adjust for your support model and Stripe objects. Prefer Stripe Dashboard / Sigma for finance reporting; Accrue focuses on state + webhooks + replay in your app.
Operator runbooks (first actions)
| Ops event | Suggested first actions |
|---|---|
[:accrue, :ops, :webhook_dlq, :dead_lettered] | Inspect accrue_webhook_events row; fix handler bug or data; use admin Replay or DLQ tools; watch replay telemetry; (Oban defaults: queue topology). |
[:accrue, :ops, :webhook_dlq, :replay] | Validate requeued_count vs expectation; if dry-run, follow up with real replay. |
[:accrue, :ops, :meter_reporting_failed] | Check source (:sync, :webhook, :reconciler); inspect accrue_meter_events; verify Stripe meter + API keys; retry after fix; (Oban defaults: queue topology). |
[:accrue, :ops, :dunning_exhaustion] | Confirm subscription status transition; notify customer success; verify payment method in Stripe. |
[:accrue, :ops, :revenue_loss] | Triage reason + subject_*; fraud vs refund policy; reconcile with Stripe balance transactions; (Oban defaults: queue topology). |
[:accrue, :ops, :charge_failed] | Map failure_code; prompt card update or alternative PM; check Radar rules in Stripe if unexpected. |
[:accrue, :ops, :incomplete_expired] | Incomplete checkout/subscription expired; clean up local rows; marketing follow-up if abandoned cart. |
[:accrue, :ops, :pdf_adapter_unavailable] | Start ChromicPDF (or switch PDF adapter); emails still send with hosted invoice link fallback. |
[:accrue, :ops, :events_upcast_failed] | Data migration issue — unknown schema_version for type; deploy compatible upcaster before replaying events; (Oban defaults: queue topology). |
[:accrue, :ops, :connect_account_deauthorized] | Disconnect Connect account in product UI; stop destination charges; audit open Connect transfers. |
[:accrue, :ops, :connect_capability_lost] | Read capability + to status; Stripe Connect onboarding / requirements. |
[:accrue, :ops, :connect_payout_failed] | Use payout_id + failure_code in Stripe; update bank account or resolve restriction. |
See also
Accrue.Telemetry—span/3helper for the firehose namespaceAccrue.Telemetry.Ops—emit/3helper for the ops namespaceAccrue.Telemetry.Metrics— defaultTelemetry.Metricsrecipe:telemetry— underlying event librarytelemetry_metrics— metric DSL