This guide is the RUN-01 procedural companion to guides/telemetry.md. That file owns the ops event catalog for [:accrue, :ops, :*] — Accrue does not duplicate that table here. Use this document for ordered triage, Oban queue defaults, expanded Stripe verification, and the four mini-playbooks where sequence matters.

Library vs host: Accrue ships workers and suggested queue names; your host application configures and starts Oban (queues, concurrency, pruning). Queue names below are defaults Accrue documents in code — you may remap them in host config; treat symptoms and checks as patterns, not hard-coded production names.

Oban queue topology

Queue names are host-configurable; the table lists Accrue’s documented defaults from use Oban.Worker in accrue/lib today.

Queue (default name)Worker moduleRole / when to lookTypical symptomsSafe first checks
:accrue_webhooksAccrue.Webhook.DispatchWorkerAsync webhook handler dispatch after ingestWebhooks stuck :processing, DLQ growth, dead-letter opsInspect accrue_webhook_events, Oban retries for this queue, handler logs (no raw bodies)
:accrue_mailersAccrue.Workers.MailerTransactional email deliveryMail backlog, PDF/email failures surfacing as opsOban job args shape, mailer adapter, ChromicPDF availability
:accrue_metersAccrue.Jobs.MeterEventsReconciler, Accrue.Jobs.MeteredRenewalReconciler, Accrue.Jobs.ProcessMeteredRenewalMeter usage reconciliation, stale renewal repair, and metered settlementmeter_reporting_failed, metered renewal repair, metered settlement recoveryReconciler jobs, Stripe meter API health, Braintree renewal evidence, accrue_meter_events, accrue_metered_renewals
:accrue_dunningAccrue.Jobs.DunningSweeperSubscription dunning sweepsUnexpected dunning transitionsScheduled runs, subscription state vs Stripe
:accrue_reconcilersAccrue.Jobs.ReconcileChargeFeesFee reconciliation for chargesFee drift vs Stripe balanceReconciler errors, Stripe charge/balance transaction lookups
:accrue_reconcilersAccrue.Jobs.ReconcileRefundFeesFee reconciliation for refundsRefund fee mismatchesSame as above for refund path
:accrue_scheduledAccrue.Jobs.DetectExpiringCardsCard expiry notices / hygieneMissing expiry emails, card warningsJob schedule, customer PM metadata (PII-safe)
:accrue_maintenanceAccrue.Webhook.PrunerWebhook event retention pruningPrune telemetry anomaliesRetention config, maintenance window, dry-run if offered

Stripe verification pattern

Use a two-layer mental model whenever Stripe is involved:

  1. Accrue layer (operational): local rows (accrue_* tables), telemetry and operation_id, foreign keys and Stripe ids stored by Accrue (cus_*, sub_*, pi_*, Connect account ids, etc.). This is application state for billing workflows — useful for triage, not a substitute for Stripe’s financial records. For customer billing portal failures, correlate [:accrue, :billing, :billing_portal, :create] :stop / :exception latency with accrue.customer.id and operation_id per telemetry.mddo not paste %Accrue.BillingPortal.Session{} inspect output into tickets. For Stripe Checkout sessions created via Accrue.Billing.create_checkout_session/2, use the same pattern on [:accrue, :billing, :checkout_session, :create], confirm whether the host runs Accrue.Processor.Fake vs live Stripe, and read the PII-safe metadata contract at telemetry.md#billing-checkout-session-createdo not paste session URLs or client_secret values into tickets.
  2. Stripe layer (verification): confirm each issue against the Stripe resource type + id using canonical documentation (e.g. Webhooks, Testing webhooks, Billing meter events) and functional Dashboard paths (e.g. Developers → Webhooks → event deliveries) rather than brittle deep links.

For finance and tax reporting, use Stripe Dashboard / reporting products as your source of truth; Accrue focuses on state, webhooks, and replay in your app.

Mini-playbook: [:accrue, :ops, :webhook_dlq, :dead_lettered]

  1. Confirm scope: identify event_id / processor_event_id from telemetry or admin (do not paste full webhook payloads or secrets into tickets).
  2. Inspect the accrue_webhook_events row and last error; decide fix vs replay before mutating data.
  3. Check Oban for Accrue.Webhook.DispatchWorker on :accrue_webhooks (see Oban queue topology); ensure the host queue is running and not wedged.
  4. If replay is required, prefer admin-gated or documented replay flows; use dry-run when available — avoid destructive deletes from this path.
  5. Cross-check the same event type in Stripe via Developers → Webhooks → recent deliveries (Webhook docs).
  6. After fix, enqueue or allow retry; watch [:accrue, :ops, :webhook_dlq, :replay] and related metrics for confirmation.

If the dead-lettered row is Braintree-sourced or tied to local portal checkout:

  1. Confirm whether the failed row should reduce into accrue.portal.checkout.completed or a normalized subscription/invoice event before replaying it.
  2. Fix host-local causes first: portal_base_url, portal_mount_path, auth/session continuity, or Hosted Fields readiness.
  3. Replay the persisted row only after the mounted path is healthy; Braintree recovery is local projection convergence, not an upstream hosted checkout retry.

Mini-playbook: [:accrue, :ops, :events_upcast_failed]

  1. Record event_id, type, and schema_version from the ops metadata (identifiers only).
  2. Determine whether a deployed upcaster is missing vs bad persisted data — do not replay until the schema path is understood.
  3. Inspect Accrue.Events / event storage per your host (see catalog row in telemetry.md); align with code version in the running release.
  4. Verify Oban or inline retry behavior will not amplify a bad version skew; pause automated replay if unsure.
  5. Queue topology for indirect jobs: see Oban queue topology if downstream dispatch is involved.
  6. Validate against Stripe only if the failing payload is a Stripe-sourced event; use Event object docs for shape, not as ledger truth.

Mini-playbook: [:accrue, :ops, :meter_reporting_failed]

Always read the contract (when the tuple fires and what each source means) at telemetry.md#meter-reporting-semantics before changing alert thresholds—this runbook is procedure only.

  1. Read source (:sync, :webhook, :reconciler) plus meter_event_id / event_name from metadata (identifiers only—no raw payloads).
  2. Load the matching accrue_meter_events row and note stripe_status, stripe_error, and timestamps so you know whether the failure epoch is already terminal.

:sync (host request path)

  1. Correlate with the host request or job that called Accrue.Billing.report_usage/3 in the same transaction window; inspect logs around Accrue.Billing.MeterEventActions for processor errors surfaced synchronously.
  2. Fix configuration or upstream Stripe errors, then retry the host operation with a fresh operation_id only when the business case requires a new attempt—idempotent replays should converge on the stored terminal row.

:reconciler (Oban :accrue_meters)

  1. Inspect Oban jobs for Accrue.Jobs.MeterEventsReconciler on :accrue_meters (Oban queue topology); confirm the queue is running and not wedged behind retries.
  2. After correcting Stripe meter setup or credentials, allow the reconciler to dequeue; watch [:accrue, :ops, :meter_reporting_failed] and default metrics for confirmation.

:webhook (meter error report path)

  1. Trace the event through accrue_webhook_events into Accrue.Webhook.DefaultHandler and the async Accrue.Webhook.DispatchWorker path; verify signature + dispatch health before mutating rows (Oban queue topology).
  2. Resolve the upstream Stripe meter error, then replay or wait for the next reconciler pass; confirm the row leaves terminal failed only when business logic intentionally clears it.

Shared verification (all sources):

  1. Confirm API keys and Stripe meter configuration for the environment (no key material in logs).
  2. Cross-check Stripe usage reporting with Metered billing — operational alignment, not accounting close.
  3. After code or config fix, allow reconciler retry where applicable; watch ops counters and host metrics.

Mini-playbook: Braintree metered renewal and settlement recovery

These steps apply to the Braintree-local metering tuples documented in telemetry.md. The ordering matters because Accrue's local invoice ledger is canonical and Braintree is settlement-only in this flow.

[:accrue, :ops, :metered_renewal_stale_repaired]

  1. Confirm the affected metered_renewal_id maps to a subscription period that should already have advanced.
  2. Inspect the corresponding subscription in Braintree and verify the cycle actually renewed; the backstop should mirror webhook truth, not invent renewal windows.
  3. Check Accrue.Jobs.MeteredRenewalReconciler and Accrue.Jobs.ProcessMeteredRenewal on :accrue_meters (Oban queue topology) so the repaired window continues into local invoice authoring and settlement.
  4. If the renewal only became visible after webhook backlog or replay work, pair this tuple with [:accrue, :ops, :webhook_dlq, :replay] so the replay trail and the stale-window repair tell one story.

[:accrue, :ops, :metered_missing_definition]

  1. Inspect the renewal window and its unmatched meter events; identify which event_name rows lack a local meter definition.
  2. Add or repair the missing definition so future windows classify those events explicitly.
  3. Replay the same renewal window after the definition exists; do not create ad-hoc manual charges that bypass the local invoice decomposition.

[:accrue, :ops, :metered_charge_awaiting_payment_method]

  1. Repair or replace the customer's default vaulted payment method.
  2. Confirm the local invoice for that renewal window is still the correct settlement target.
  3. Replay the same renewal window so Accrue reuses the existing charge unit instead of creating a second Transaction.sale.
  4. If checkout completion is still ambiguous for the same customer, verify whether accrue.portal.checkout.completed already persisted locally before creating any manual recovery plan.

[:accrue, :ops, :metered_charge_failed_exhausted]

  1. Confirm the failure class and the current local invoice state before retrying anything.
  2. Decide whether to retry, write off, or pair the failed renewal with a later operator-approved recovery step.
  3. Preserve the original failed attempt trail; do not delete the renewal or charge-attempt rows to force a clean slate.

Mini-playbook: [:accrue, :ops, :revenue_loss]

  1. Capture reason, subject_type, subject_id, and currency amounts from telemetry (aggregates / IDs only — no customer narrative in shared logs).
  2. Triage Accrue rows (invoice, credit note, adjustment) that triggered the signal; avoid manual balance edits without a controlled procedure.
  3. Check related async work on :accrue_reconcilers and :accrue_webhooks if the loss correlates with webhook or fee reconciliation (Oban queue topology).
  4. In Stripe, locate the same business object (charge, refund, dispute) via Dashboard search or list filters; use Balance transactions categories as reference for classification, not as instructions to reproduce Sigma in-app.
  5. Document outcome in your ticketing system; escalate finance questions on Stripe’s side, not via Accrue as a ledger substitute.

RUN-01 coverage

  • Full ops tuple list and one-line first actions live under ## Operator runbooks (first actions) in telemetry.md — bookmark that table for every RUN-01 class, including :connect_account_deauthorized, :connect_payout_failed, :dunning_exhaustion, :charge_failed, :incomplete_expired, :pdf_adapter_unavailable, replay (:webhook_dlq, :replay), and prune (:webhook_dlq, :prune).
  • This file adds depth for the four classic mini-playbooks above plus the Braintree metered-billing recovery sequence.

See also

  • guides/telemetry.md — ops catalog SSOT and Operator runbooks (first actions) table
  • Accrue.Telemetry.Opsemit/3 contract (lib/accrue/telemetry/ops.ex in the repo; published API on Hexdocs)
  • Hexdocs path pattern: https://hexdocs.pm/accrue/ (pin the version to your mix.lock)