Telemetry & LiveDashboard Guide
View SourceSlackBot emits Telemetry events across its internal systems—connection lifecycle, API calls, rate and tier limiters, handler execution, cache sync, and diagnostics—so you can monitor your bot's health without bolting custom instrumentation onto each handler. This guide shows how to listen to those events and surface them in Phoenix LiveDashboard (or any Telemetry consumer).
Available Events
| Event | Measurements | Metadata |
|---|---|---|
[:slackbot, :api, :request] | %{duration: native} | %{method: String.t(), status: :ok | :error | :exception | :unknown} |
[:slackbot, :api, :rate_limited] | %{retry_after_ms: integer, observed_at_ms: integer} | %{method: String.t(), key: term()} |
[:slackbot, :connection, :state] | %{count: 1} | %{state: :connected | :disconnected | :terminated | :down | :error, reason: term()} |
[:slackbot, :connection, :rate_limited] | %{delay_ms: integer} | %{} |
[:slackbot, :healthcheck, :ping] | %{duration: native} / %{delay_ms: integer} | %{status: :ok | :error | :fatal | :rate_limited | :unknown, reason: term()} |
[:slackbot, :healthcheck, :disabled] | %{count: 1} | %{} |
[:slackbot, :cache, :sync] | %{duration: native, count: integer} | %{kind: :users | :channels, status: :ok | :error} |
[:slackbot, :tier_limiter, :decision] | %{count: 1, queue_length: integer, tokens: float} | %{method: String.t(), scope_key: term(), decision: :allow | :queue | :other} |
[:slackbot, :tier_limiter, :suspend] | %{delay_ms: integer} | %{method: String.t(), scope_key: term()} |
[:slackbot, :tier_limiter, :resume] | %{queue_length: integer, tokens: float} | %{method: String.t() | nil, scope_key: term(), bucket_id: term()} |
[:slackbot, :rate_limiter, :decision] | %{queue_length: integer, in_flight: integer} | %{key: term(), method: String.t(), decision: :allow | :queue | :unknown} |
[:slackbot, :rate_limiter, :blocked] | %{delay_ms: integer} | %{key: term(), method: String.t()} |
[:slackbot, :rate_limiter, :drain] | %{drained: integer, delay_ms: integer | nil} | %{key: term(), reason: term()} |
[:slackbot, :handler, :ingress] | %{count: 1} | %{decision: :queue | :duplicate, type: String.t(), envelope_id: String.t() | nil} |
[:slackbot, :handler, :dispatch, :start/:stop] (span) | %{system_time: native} / %{duration: native} | %{type: event_type, status: :ok | :error | :exception | :halted, envelope_id: String.t() | nil} |
[:slackbot, :handler, :middleware, :halt] | %{count: 1} | %{type: String.t(), middleware: String.t(), response: term(), envelope_id: String.t() | nil} |
[:slackbot, :ack, :http] | %{duration: native} | %{status: :ok | :error | :unknown | :exception} |
[:slackbot, :diagnostics, :record] | %{count: 1} | %{direction: :inbound | :outbound} |
[:slackbot, :diagnostics, :replay] | %{count: integer} | %{filters: map()} |
[:slackbot, :event_buffer, :record] | %{result: :ok | :duplicate} | %{key: String.t() | nil, result: :ok | :duplicate} |
[:slackbot, :event_buffer, :delete] | %{count: 0 | 1} | %{key: String.t() | nil, key_present?: boolean()} |
[:slackbot, :event_buffer, :seen] | %{count: 1} | %{key: String.t() | nil, seen?: boolean()} |
[:slackbot, :event_buffer, :pending] | %{count: integer} | %{count: integer} |
All event names are prefixed with your configured telemetry_prefix
([:slackbot] by default), so a handler will actually receive
[:slackbot, :connection, :state], etc.
Concepts at a Glance
Rate limiter vs tier limiter
Rate limiter shapes individual Web API calls. It keeps a per-channel bucket for high-volume chat methods and a workspace bucket for everything else. A
:decisionevent exposes the currentqueue_length(pending requests) andin_flightcount (requests that already passed the gate). When Slack replies with429 Retry-After, the ETS-backed adapter stores a monotonicblocked_until, which surfaces as[:rate_limiter, :blocked](withdelay_ms) and, once the timer drains the queue,[:rate_limiter, :drain]with the number of releases plus the same delay. Track these to understand whether you are pushing against channel-specific chat limits (queue spikes) or Slack-imposed cooling-off periods (blocked/drain events).Tier limiter enforces Slack's published per-method quotas (the "Tier 1-4" buckets). Each method (or group) gets a fractional token bucket.
queue_lengthindicates how many callers are waiting for the window to refill, whiletokensis the precise number of quota tokens remaining (it is a float because Slack quotas are averaged over time). When Slack returnsRetry-Afterfor a tiered method, the limiter suspends that bucket, emits a:suspendevent with the delay, and later emits a corresponding:resumewhen tokens become available again. Use the suspend/resume telemetry to answer “which scope is starved right now?”
Handler pipeline anatomy
[:handler, :ingress]fires for every envelope before the router runs. Adecisionof:queuemeans the event entered the pipeline;:duplicateindicates the event-buffer dropped a replay. Pair it with envelope IDs to reason about dedupe behaviour.[:handler, :dispatch, :stop]is the Telemetry span that wraps your router. It now carries astatus(:ok,:error,:exception,:halted) plus the envelope ID so you can correlate slow handlers with specific payloads.When a middleware halts the pipeline, SlackBot emits
[:handler, :middleware, :halt]with the middleware module/function and the response returned. This makes it obvious when a safety middleware is short-circuiting bursts of traffic.
Telemetry Stats Cache
SlackBot can maintain a rolling snapshot of the signals above without any external collector. Set
config :my_app, MyApp.SlackBot,
telemetry_stats: [
enabled: true,
flush_interval_ms: 15_000,
ttl_ms: 300_000
]When enabled, SlackBot.TelemetryStats attaches to your Telemetry prefix, rolls up counters (API
throughput, handler statuses, rate/tier limiter queues, connection states, etc.), and periodically
persists the snapshot to the cache. Because it goes through the cache
adapter, the stats work regardless of whether you are using the default ETS backend or a Redis
adapter.
Read the latest snapshot with:
%{
generated_at_ms: generated,
expires_at_ms: expires,
stats: stats
} = SlackBot.TelemetryStats.snapshot(MyApp.SlackBot)
stats.api.total
stats.rate_limiter.last_block_delay_ms
stats.handler.status.haltedThe map mirrors the structures described above (for example stats.tier.tokens keeps the most
recent fractional token value). LiveDashboard, PromEx, or any other Telemetry-aware tooling can
consume this snapshot directly or keep listening to the raw events listed earlier.
Wiring LiveDashboard Metrics
If you already use Phoenix LiveDashboard, add the metrics below to the module where you
define your dashboard metrics (often MyAppWeb.Telemetry). The only requirement is
having Telemetry.Metrics in your deps (Phoenix generators include it by default).
defmodule MyAppWeb.Telemetry do
use Supervisor
import Telemetry.Metrics
@slackbot_prefix [:slackbot]
def metrics do
[
counter(@slackbot_prefix ++ [:connection, :state],
tags: [:state],
description: "Connection state transitions"
),
counter(@slackbot_prefix ++ [:healthcheck, :ping],
tags: [:status],
description: "Slack healthcheck pings by status"
),
last_value(@slackbot_prefix ++ [:connection, :rate_limited],
unit: :millisecond,
description: "Slack backoff delay when rate limited",
measurement: :delay_ms
),
summary(@slackbot_prefix ++ [:diagnostics, :replay],
unit: :event,
description: "Diagnostics replays issued",
measurement: :count
),
counter(@slackbot_prefix ++ [:tier_limiter, :decision],
tags: [:method, :decision],
description: "Tier limiter decisions by API method"
),
last_value(@slackbot_prefix ++ [:tier_limiter, :decision],
unit: :token,
description: "Tier limiter tokens remaining",
measurement: :tokens
),
summary(@slackbot_prefix ++ [:handler, :dispatch, :duration],
unit: {:native, :millisecond},
description: "Handler execution time"
),
counter(@slackbot_prefix ++ [:handler, :ingress],
tags: [:decision],
description: "Ingress decisions (pipeline vs duplicate)"
),
counter(@slackbot_prefix ++ [:handler, :middleware, :halt],
description: "Middleware short-circuits"
),
last_value(@slackbot_prefix ++ [:rate_limiter, :blocked],
unit: :millisecond,
description: "Current rate limiter block delay",
measurement: :delay_ms
),
counter(@slackbot_prefix ++ [:rate_limiter, :drain],
description: "Retry-after drains",
measurement: :drained
)
]
end
endThe tier limiter metrics let you spot when Slack’s published quotas are nearing their cap.
queue_length spikes mean requests are waiting for the bucket to refill, and you can drill
into the tagged decision counter to see which methods are being throttled.
Note: Telemetry spans emit
:start/:stopevents. Phoenix LiveDashboard expects asummarymetric built from the:stopevent wheremeasurement: :duration. The helper above covers that pattern.
Once the metrics function returns these entries, expose them in router.ex (if you have
LiveDashboard enabled):
live_dashboard "/dashboard",
metrics: MyAppWeb.TelemetryConsuming Events Without Phoenix
You can always attach your own handlers if you don’t have Phoenix at all:
:telemetry.attach(
{:slackbot_logger, self()},
[:slackbot, :connection, :state],
fn _event, _measurements, %{state: state}, _ ->
Logger.info("Slack connection state changed: #{state}")
end,
nil
)Because SlackBot uses standard Telemetry primitives, any tool that understands Telemetry events (StatsD exporters, OpenTelemetry bridges, etc.) will work out of the box.
Sample Snapshot (/demo telemetry)
The sample router in examples/basic_bot/ includes a /demo telemetry command that ships with a
small telemetry probe module. The probe subscribes to the events above, rolls them up in-memory,
and renders the snapshot as a Block Kit card (cache health, API throughput, limiter queues,
connection state, and healthcheck status). It's a practical example of how to consume Telemetry
without Phoenix:
- The probe calls
:telemetry.attach_many/4for the SlackBot prefix. - It keeps lightweight counters/last-seen metadata in a GenServer.
- The slash command pulls a snapshot and formats it for Slack.
Feel free to lift that helper into your own bots if you want a prebuilt telemetry "dashboard" inside Slack itself.
Exposing Diagnostics in LiveDashboard
Pair Telemetry metrics with diagnostics replay for a richer debugging workflow:
- Enable diagnostics in your config:
config :slack_bot_ws, SlackBot, diagnostics: [enabled: true, buffer_size: 300] - Add a custom LiveDashboard page or a Phoenix route that calls
SlackBot.Diagnostics.list/2and renders the recent frames. - Provide a button that hits an endpoint wired to
SlackBot.Diagnostics.replay/2.
Because diagnostics stores payloads in ETS via a supervised GenServer, calling those functions does not block the main Slack connection.
Summary
- Telemetry events cover connection lifecycle, handler spans, and diagnostics activities.
- LiveDashboard can plot these metrics with a few lines of
Telemetry.Metrics. - Diagnostics replays + Telemetry metrics offer a full picture when debugging Slack bots in production.
Next Steps
- Getting Started — set up a Slack App and run your first handler
- Rate Limiting — understand how tier-aware limiting works
- Slash Grammar — build deterministic command parsers
- Diagnostics — capture and replay events for debugging