Resiliency.CircuitBreaker (Resiliency v0.6.0)

Copy Markdown View Source

A circuit breaker with sliding window failure-rate tracking and automatic recovery.

A circuit breaker wraps calls to a downstream service and monitors their outcomes. When the failure rate exceeds a threshold, the circuit "trips" to the :open state, rejecting calls immediately without contacting the downstream. After a configurable timeout, the circuit moves to :half_open and allows a small number of probe calls through. If the probes succeed, the circuit closes and traffic resumes. If they fail, the circuit reopens.

Inspired by Resilience4j and gobreaker, with an Elixir-idiomatic API that follows the conventions of this library.

When to use

  • Your downstream service experiences periodic outages and you want to stop calling it until it recovers -- avoiding wasted resources and cascading failures.
  • You need automatic recovery: after a cool-down period, probe calls verify the downstream is healthy before resuming full traffic.
  • You want failure-rate-based tripping (not just consecutive failures) with a sliding window that forgets old outcomes naturally.
  • You need slow call detection: calls that succeed but take too long can trip the circuit just like failures.

Quick start

# 1. Add to your supervision tree
children = [
  {Resiliency.CircuitBreaker, name: MyApp.Breaker, failure_rate_threshold: 0.5}
]
Supervisor.start_link(children, strategy: :one_for_one)

# 2. Use it
case Resiliency.CircuitBreaker.call(MyApp.Breaker, fn -> HttpClient.get(url) end) do
  {:ok, response} -> handle_response(response)
  {:error, :circuit_open} -> {:error, :service_degraded}
  {:error, reason} -> {:error, reason}
end

How it works

The circuit breaker runs as a GenServer that maintains state and failure rates. The protected function runs in the caller's process, not inside the GenServer. This means:

  • The GenServer is never blocked by slow downstream calls.
  • A crash in the protected function does not crash the GenServer.
  • Permission checks are synchronous (GenServer.call), but recording outcomes is asynchronous (GenServer.cast) for minimal overhead.

States

The circuit breaker has three states:

  • :closed -- Normal operation. Calls are allowed through. Outcomes are recorded in a count-based sliding window. When the failure rate (or slow call rate) exceeds the configured threshold and the minimum number of calls has been reached, the circuit transitions to :open.

  • :open -- Calls are rejected immediately with {:error, :circuit_open}. After open_timeout milliseconds, the circuit transitions to :half_open.

  • :half_open -- A limited number of probe calls are allowed through (controlled by permitted_calls_in_half_open). If the probes succeed (failure rate stays below threshold), the circuit transitions back to :closed. If any probe fails above the threshold, it transitions back to :open.

Return values

FunctionSuccessCircuit openfn crashes
call/2,3{:ok, result}{:error, :circuit_open}{:error, reason}
allow/1{:ok, record_fn}{:error, :circuit_open}N/A

Error handling

If the function raises, exits, or throws, the error is caught and returned as {:error, reason}. The outcome is classified using should_record and recorded to the sliding window.

{:error, {%RuntimeError{message: "boom"}, _stacktrace}} =
  Resiliency.CircuitBreaker.call(MyApp.Breaker, fn -> raise "boom" end)

Algorithm Complexity

FunctionTimeSpace
call/2,3O(1) GenServer call + O(1) GenServer cast + O(f) functionO(w) — sliding window of size w
allow/1O(1) GenServer callO(1)
get_state/1O(1)O(1)
get_stats/1O(1)O(1)
reset/1O(w) — reallocates windowO(w)
force_open/1O(1)O(1)
force_close/1O(w) — resets windowO(w)

Telemetry

Call events are emitted in the caller's process. The state_change event is emitted inside the GenServer process (state transitions happen asynchronously in handle_cast / handle_info callbacks). See Resiliency.Telemetry for the complete event catalogue.

[:resiliency, :circuit_breaker, :call, :start]

Emitted at the beginning of every call/2,3 invocation, before the permission check.

Measurements

KeyTypeDescription
system_timeintegerSystem.system_time() at emission time

Metadata

KeyTypeDescription
nametermThe circuit breaker name passed to call/2,3

[:resiliency, :circuit_breaker, :call, :rejected]

Emitted when the circuit is open and the call is rejected without executing the function. Always followed immediately by a :stop event.

Measurements

KeyTypeDescription
(none)

Metadata

KeyTypeDescription
nametermThe circuit breaker name

[:resiliency, :circuit_breaker, :call, :stop]

Emitted after every call/2,3 completes — whether permitted, rejected, successful, or failed.

Measurements

KeyTypeDescription
durationintegerElapsed native time units (System.monotonic_time/0 delta)

Metadata

KeyTypeDescription
nametermThe circuit breaker name
result`:ok:error`:ok on success, :error on failure or rejection
error`termnil`The error reason, :circuit_open if rejected, nil on success

[:resiliency, :circuit_breaker, :state_change]

Emitted inside the GenServer process when the circuit transitions between states.

GenServer process

This event is emitted from within the circuit breaker GenServer, not the caller's process. Slow telemetry handlers attached to this event will block the GenServer's message loop. Keep handlers fast.

Measurements

KeyTypeDescription
(none)

Metadata

KeyTypeDescription
nametermThe circuit breaker name
from`:closed:open:half_open`Previous state
to`:closed:open:half_open`New state

Summary

Types

A circuit breaker reference — a registered name, PID, or {:via, ...} tuple.

Functions

Two-step API: check permission and get a recording function.

Executes fun through the circuit breaker.

Returns a child specification for starting under a supervisor.

Forces the circuit to the :closed state.

Forces the circuit to the :open state.

Returns the current state of the circuit breaker.

Returns statistics about the circuit breaker.

Resets the circuit breaker to its initial state.

Starts a circuit breaker linked to the current process.

Types

name()

@type name() :: GenServer.server()

A circuit breaker reference — a registered name, PID, or {:via, ...} tuple.

Functions

allow(name)

@spec allow(name()) :: {:ok, (atom() -> :ok)} | {:error, :circuit_open}

Two-step API: check permission and get a recording function.

This is useful when you cannot wrap the operation in a zero-arity function (e.g., the work spans multiple process messages or external systems).

If the circuit allows the call, returns {:ok, record_fn} where record_fn is a function that accepts :success, :failure, or :ignore and records the outcome. The record function is one-shot — only the first call takes effect; subsequent calls are silent no-ops.

If the caller process dies before calling record_fn, the permit is automatically released as a :failure. This prevents permits from being permanently leaked when callers crash.

Parameters

  • name -- the name or PID of a running circuit breaker.

Returns

{:ok, record_fn} when the circuit allows the call, or {:error, :circuit_open} when the circuit is open.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :allow_cb)
iex> {:ok, record} = Resiliency.CircuitBreaker.allow(:allow_cb)
iex> record.(:success)
:ok

call(name, fun, opts \\ [])

@spec call(name(), (-> result), keyword()) :: {:ok, result} | {:error, term()}
when result: term()

Executes fun through the circuit breaker.

If the circuit is :closed or :half_open (with available permits), the function runs in the caller's process. The result is classified using the configured should_record function and recorded asynchronously.

If the circuit is :open, returns {:error, :circuit_open} immediately without executing fun.

Parameters

  • name -- the name or PID of a running circuit breaker.
  • fun -- a zero-arity function to execute.
  • opts -- optional keyword list. Currently unused, reserved for future options.

Returns

{:ok, result} on success, {:error, :circuit_open} when the circuit is open, or {:error, reason} if the function raises, exits, or throws.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :call_cb)
iex> Resiliency.CircuitBreaker.call(:call_cb, fn -> {:ok, 42} end)
{:ok, {:ok, 42}}

child_spec(opts)

Returns a child specification for starting under a supervisor.

Options

  • :name -- (required) the name to register the circuit breaker under.
  • :window_size -- number of call outcomes in the sliding window. Default 100.
  • :failure_rate_threshold -- failure rate (0.0–1.0) that trips the circuit. Default 0.5.
  • :slow_call_threshold -- duration in ms above which a call is "slow". Default :infinity (disabled).
  • :slow_call_rate_threshold -- slow call rate (0.0–1.0) that trips the circuit. Default 1.0.
  • :open_timeout -- ms before :open:half_open. Default 60_000.
  • :permitted_calls_in_half_open -- probe calls allowed in :half_open. Default 1.
  • :minimum_calls -- min recorded calls before evaluating failure rate. Default 10.
  • :should_record -- fn result -> :success | :failure | :ignore classification function.

  • :on_state_change -- fn name, from_state, to_state -> any callback.

Examples

children = [
  {Resiliency.CircuitBreaker, name: MyApp.Breaker, failure_rate_threshold: 0.5}
]
Supervisor.start_link(children, strategy: :one_for_one)

iex> spec = Resiliency.CircuitBreaker.child_spec(name: :my_cb)
iex> spec.id
{Resiliency.CircuitBreaker, :my_cb}

force_close(name)

@spec force_close(name()) :: :ok

Forces the circuit to the :closed state.

Resets the sliding window and clears any timers.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :fc_cb)
iex> Resiliency.CircuitBreaker.force_open(:fc_cb)
:ok
iex> Resiliency.CircuitBreaker.force_close(:fc_cb)
:ok
iex> Resiliency.CircuitBreaker.get_state(:fc_cb)
:closed

force_open(name)

@spec force_open(name()) :: :ok

Forces the circuit to the :open state.

The circuit stays open until reset/1 or force_close/1 is called. No automatic :open:half_open timer is started.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :fo_cb)
iex> Resiliency.CircuitBreaker.force_open(:fo_cb)
:ok
iex> Resiliency.CircuitBreaker.get_state(:fo_cb)
:open

get_state(name)

@spec get_state(name()) :: :closed | :open | :half_open

Returns the current state of the circuit breaker.

Returns

:closed, :open, or :half_open.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :state_cb)
iex> Resiliency.CircuitBreaker.get_state(:state_cb)
:closed

get_stats(name)

@spec get_stats(name()) :: map()

Returns statistics about the circuit breaker.

Returns

A map containing:

  • :state -- the current state (:closed, :open, or :half_open)
  • :total -- total recorded calls in the sliding window
  • :failures -- number of recorded failures
  • :slow_calls -- number of recorded slow calls
  • :failure_rate -- current failure rate (0.0–1.0)
  • :slow_call_rate -- current slow call rate (0.0–1.0)

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :stats_cb)
iex> stats = Resiliency.CircuitBreaker.get_stats(:stats_cb)
iex> stats.state
:closed
iex> stats.total
0

reset(name)

@spec reset(name()) :: :ok

Resets the circuit breaker to its initial state.

Clears the sliding window, cancels any open timeout timer, and transitions to :closed.

Examples

iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :reset_cb)
iex> Resiliency.CircuitBreaker.reset(:reset_cb)
:ok

start_link(opts)

@spec start_link(keyword()) :: GenServer.on_start()

Starts a circuit breaker linked to the current process.

Typically you'd use child_spec/1 instead to start under a supervisor. See child_spec/1 for options.

Examples

{:ok, pid} = Resiliency.CircuitBreaker.start_link(name: MyApp.Breaker)