Adaptive delay tracker with token-bucket hedge throttling.
Maintains a rolling window of latency samples and computes a target percentile to use as the hedge delay. A token bucket limits the overall hedge rate: each request credits a small amount, each hedge costs more, so hedging naturally throttles under load.
How it works
The tracker is a GenServer that holds two pieces of mutable state: a
Resiliency.Hedged.Percentile circular buffer of recent latency samples,
and a floating-point token bucket.
Adaptive delay — After every completed request, the caller records the
observed latency via record/2. The sample is added to the circular buffer
(see Resiliency.Hedged.Percentile). When get_config/1 is called, the
tracker computes the configured percentile (e.g., p95) of the buffered
samples and clamps the result to [min_delay, max_delay]. Until at least
:min_samples observations have been recorded, the tracker returns
:initial_delay instead — a sensible default while the system warms up.
Token bucket — Each completed request credits :token_success_credit
tokens (default 0.1). Each hedge that fires costs :token_hedge_cost
tokens (default 1.0). Hedging is only allowed when the bucket contains at
least :token_threshold tokens. Because a hedge costs 10x what a success
earns, hedging naturally throttles to roughly 10% of traffic under
sustained load. If hedges consistently win (indicating a real latency
problem rather than a transient spike), the bucket refills quickly and
hedging continues. If hedges rarely help, the bucket drains and hedging
pauses — protecting the downstream service from unnecessary duplicate load.
Statistics — stats/1 returns a snapshot of counters (total requests,
hedged requests, hedge wins), percentiles (p50, p95, p99), the current
adaptive delay, and the token bucket level. This is useful for dashboards
and alerting.
Algorithm Complexity
| Function | Time | Space |
|---|---|---|
start_link/1 | O(1) | O(1) — empty buffer and initial token bucket |
get_config/1 | O(1) — percentile lookup is O(1) via tuple indexing | O(1) |
record/2 | O(n) where n = buffer_size — sorted insert/delete on the internal sorted list | O(n) — the circular buffer holds at most n samples |
stats/1 | O(1) — percentile lookups are O(1) | O(1) |
Usage
{:ok, _} = Resiliency.Hedged.Tracker.start_link(name: MyTracker)
# Query the current adaptive delay and whether hedging is allowed
{delay, allow?} = Resiliency.Hedged.Tracker.get_config(MyTracker)
# Record an observation after a request completes
Resiliency.Hedged.Tracker.record(MyTracker, %{latency_ms: 42, hedged?: false, hedge_won?: false})
# Inspect tracker state
Resiliency.Hedged.Tracker.stats(MyTracker)In most cases you won't call these functions directly — Resiliency.Hedged.run/3
does it automatically when you pass a tracker name.
Options
:name— required, the registered name for the tracker process:percentile— target percentile for adaptive delay (default:95):buffer_size— max latency samples to keep (default:1000):min_delay— floor for adaptive delay in ms (default:1):max_delay— ceiling for adaptive delay in ms (default:5_000):initial_delay— delay used before enough samples are collected (default:100):min_samples— samples needed before switching from:initial_delayto adaptive (default:10):token_max— token bucket capacity (default:10):token_success_credit— tokens earned per completed request (default:0.1):token_hedge_cost— tokens spent when a hedge fires (default:1.0):token_threshold— minimum tokens required to allow hedging (default:1.0)
Summary
Functions
Returns a specification to start this module under a supervisor.
Returns {delay_ms, allow_hedge?} based on current adaptive state.
Records an observation after a request completes.
Starts a tracker process linked to the caller.
Returns current stats including counters, percentiles, delay, and tokens.
Types
@type t() :: %Resiliency.Hedged.Tracker{ buffer: term(), initial_delay: term(), max_delay: term(), min_delay: term(), min_samples: term(), percentile_target: term(), stats: term(), token_hedge_cost: term(), token_max: term(), token_success_credit: term(), token_threshold: term(), tokens: term() }
Internal state of the tracker GenServer.
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.
@spec get_config(GenServer.server()) :: {non_neg_integer(), boolean()}
Returns {delay_ms, allow_hedge?} based on current adaptive state.
The delay is the configured percentile of recent latency samples, clamped
to [min_delay, max_delay]. Before :min_samples observations are
recorded, :initial_delay is returned instead.
Hedging is allowed when the token bucket has at least :token_threshold
tokens remaining.
Parameters
server-- the name or PID of a runningResiliency.Hedged.Trackerprocess.
Returns
A tuple {delay_ms, allow_hedge?} where delay_ms is a non-negative integer representing the adaptive delay in milliseconds, and allow_hedge? is a boolean indicating whether the token bucket permits hedging.
@spec record(GenServer.server(), map()) :: :ok
Records an observation after a request completes.
Expects a map with the following keys:
:latency_ms— end-to-end latency of the winning response in milliseconds:hedged?— whether a hedge request was actually dispatched:hedge_won?— whether the hedge (not the original) produced the winning response
The latency sample feeds the percentile buffer, while :hedged? and
:hedge_won? update the token bucket and counters.
Parameters
server-- the name or PID of a runningResiliency.Hedged.Trackerprocess.observation-- a map containing:latency_ms(number),:hedged?(boolean), and:hedge_won?(boolean).
Returns
:ok. The observation is processed asynchronously via GenServer.cast/2.
@spec start_link(keyword()) :: GenServer.on_start()
Starts a tracker process linked to the caller.
Requires a :name option. See module documentation for all options.
Parameters
opts-- keyword list of options. See the module documentation for the full list. The:nameoption is required.
Returns
{:ok, pid} on success, or {:error, reason} if the process cannot be started.
Raises
Raises KeyError if the required :name option is not provided.
Examples
{:ok, _pid} = Resiliency.Hedged.Tracker.start_link(name: MyTracker)
Resiliency.Hedged.Tracker.start_link(name: MyTracker, percentile: 99, min_delay: 5)
@spec stats(GenServer.server()) :: map()
Returns current stats including counters, percentiles, delay, and tokens.
The returned map contains:
:total_requests— number of observations recorded:hedged_requests— number of observations where a hedge fired:hedge_won— number of times the hedge beat the original:p50,:p95,:p99— latency percentiles from the sample buffer:current_delay— adaptive delay that would be returned byget_config/1:tokens— current token bucket level
Parameters
server-- the name or PID of a runningResiliency.Hedged.Trackerprocess.
Returns
A map with keys :total_requests, :hedged_requests, :hedge_won, :p50, :p95, :p99, :current_delay, and :tokens.