Resiliency.Hedged.Tracker (Resiliency v0.6.0)

Copy Markdown View Source

Adaptive delay tracker with token-bucket hedge throttling.

Maintains a rolling window of latency samples and computes a target percentile to use as the hedge delay. A token bucket limits the overall hedge rate: each request credits a small amount, each hedge costs more, so hedging naturally throttles under load.

How it works

The tracker is a GenServer that holds two pieces of mutable state: a Resiliency.Hedged.Percentile circular buffer of recent latency samples, and a floating-point token bucket.

Adaptive delay — After every completed request, the caller records the observed latency via record/2. The sample is added to the circular buffer (see Resiliency.Hedged.Percentile). When get_config/1 is called, the tracker computes the configured percentile (e.g., p95) of the buffered samples and clamps the result to [min_delay, max_delay]. Until at least :min_samples observations have been recorded, the tracker returns :initial_delay instead — a sensible default while the system warms up.

Token bucket — Each completed request credits :token_success_credit tokens (default 0.1). Each hedge that fires costs :token_hedge_cost tokens (default 1.0). Hedging is only allowed when the bucket contains at least :token_threshold tokens. Because a hedge costs 10x what a success earns, hedging naturally throttles to roughly 10% of traffic under sustained load. If hedges consistently win (indicating a real latency problem rather than a transient spike), the bucket refills quickly and hedging continues. If hedges rarely help, the bucket drains and hedging pauses — protecting the downstream service from unnecessary duplicate load.

Statisticsstats/1 returns a snapshot of counters (total requests, hedged requests, hedge wins), percentiles (p50, p95, p99), the current adaptive delay, and the token bucket level. This is useful for dashboards and alerting.

Algorithm Complexity

FunctionTimeSpace
start_link/1O(1)O(1) — empty buffer and initial token bucket
get_config/1O(1) — percentile lookup is O(1) via tuple indexingO(1)
record/2O(n) where n = buffer_size — sorted insert/delete on the internal sorted listO(n) — the circular buffer holds at most n samples
stats/1O(1) — percentile lookups are O(1)O(1)

Usage

{:ok, _} = Resiliency.Hedged.Tracker.start_link(name: MyTracker)

# Query the current adaptive delay and whether hedging is allowed
{delay, allow?} = Resiliency.Hedged.Tracker.get_config(MyTracker)

# Record an observation after a request completes
Resiliency.Hedged.Tracker.record(MyTracker, %{latency_ms: 42, hedged?: false, hedge_won?: false})

# Inspect tracker state
Resiliency.Hedged.Tracker.stats(MyTracker)

In most cases you won't call these functions directly — Resiliency.Hedged.run/3 does it automatically when you pass a tracker name.

Options

  • :name — required, the registered name for the tracker process
  • :percentile — target percentile for adaptive delay (default: 95)
  • :buffer_size — max latency samples to keep (default: 1000)
  • :min_delay — floor for adaptive delay in ms (default: 1)
  • :max_delay — ceiling for adaptive delay in ms (default: 5_000)
  • :initial_delay — delay used before enough samples are collected (default: 100)
  • :min_samples — samples needed before switching from :initial_delay to adaptive (default: 10)
  • :token_max — token bucket capacity (default: 10)
  • :token_success_credit — tokens earned per completed request (default: 0.1)
  • :token_hedge_cost — tokens spent when a hedge fires (default: 1.0)
  • :token_threshold — minimum tokens required to allow hedging (default: 1.0)

Summary

Types

t()

Internal state of the tracker GenServer.

Functions

Returns a specification to start this module under a supervisor.

Returns {delay_ms, allow_hedge?} based on current adaptive state.

Records an observation after a request completes.

Starts a tracker process linked to the caller.

Returns current stats including counters, percentiles, delay, and tokens.

Types

t()

@type t() :: %Resiliency.Hedged.Tracker{
  buffer: term(),
  initial_delay: term(),
  max_delay: term(),
  min_delay: term(),
  min_samples: term(),
  percentile_target: term(),
  stats: term(),
  token_hedge_cost: term(),
  token_max: term(),
  token_success_credit: term(),
  token_threshold: term(),
  tokens: term()
}

Internal state of the tracker GenServer.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

get_config(server)

@spec get_config(GenServer.server()) :: {non_neg_integer(), boolean()}

Returns {delay_ms, allow_hedge?} based on current adaptive state.

The delay is the configured percentile of recent latency samples, clamped to [min_delay, max_delay]. Before :min_samples observations are recorded, :initial_delay is returned instead.

Hedging is allowed when the token bucket has at least :token_threshold tokens remaining.

Parameters

Returns

A tuple {delay_ms, allow_hedge?} where delay_ms is a non-negative integer representing the adaptive delay in milliseconds, and allow_hedge? is a boolean indicating whether the token bucket permits hedging.

record(server, observation)

@spec record(GenServer.server(), map()) :: :ok

Records an observation after a request completes.

Expects a map with the following keys:

  • :latency_ms — end-to-end latency of the winning response in milliseconds
  • :hedged? — whether a hedge request was actually dispatched
  • :hedge_won? — whether the hedge (not the original) produced the winning response

The latency sample feeds the percentile buffer, while :hedged? and :hedge_won? update the token bucket and counters.

Parameters

  • server -- the name or PID of a running Resiliency.Hedged.Tracker process.
  • observation -- a map containing :latency_ms (number), :hedged? (boolean), and :hedge_won? (boolean).

Returns

:ok. The observation is processed asynchronously via GenServer.cast/2.

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Starts a tracker process linked to the caller.

Requires a :name option. See module documentation for all options.

Parameters

  • opts -- keyword list of options. See the module documentation for the full list. The :name option is required.

Returns

{:ok, pid} on success, or {:error, reason} if the process cannot be started.

Raises

Raises KeyError if the required :name option is not provided.

Examples

{:ok, _pid} = Resiliency.Hedged.Tracker.start_link(name: MyTracker)

Resiliency.Hedged.Tracker.start_link(name: MyTracker, percentile: 99, min_delay: 5)

stats(server)

@spec stats(GenServer.server()) :: map()

Returns current stats including counters, percentiles, delay, and tokens.

The returned map contains:

  • :total_requests — number of observations recorded
  • :hedged_requests — number of observations where a hedge fired
  • :hedge_won — number of times the hedge beat the original
  • :p50, :p95, :p99 — latency percentiles from the sample buffer
  • :current_delay — adaptive delay that would be returned by get_config/1
  • :tokens — current token bucket level

Parameters

Returns

A map with keys :total_requests, :hedged_requests, :hedge_won, :p50, :p95, :p99, :current_delay, and :tokens.