# `Resiliency.CircuitBreaker`
[🔗](https://github.com/yoavgeva/resiliency/blob/v0.6.0/lib/resiliency/circuit_breaker.ex#L1)

A circuit breaker with sliding window failure-rate tracking and automatic recovery.

A circuit breaker wraps calls to a downstream service and monitors their
outcomes. When the failure rate exceeds a threshold, the circuit "trips"
to the `:open` state, rejecting calls immediately without contacting the
downstream. After a configurable timeout, the circuit moves to `:half_open`
and allows a small number of probe calls through. If the probes succeed,
the circuit closes and traffic resumes. If they fail, the circuit reopens.

Inspired by [Resilience4j](https://resilience4j.readme.io/docs/circuitbreaker)
and [gobreaker](https://github.com/sony/gobreaker), with an Elixir-idiomatic
API that follows the conventions of this library.

## When to use

  * Your downstream service experiences periodic outages and you want to
    stop calling it until it recovers -- avoiding wasted resources and
    cascading failures.
  * You need automatic recovery: after a cool-down period, probe calls
    verify the downstream is healthy before resuming full traffic.
  * You want failure-rate-based tripping (not just consecutive failures)
    with a sliding window that forgets old outcomes naturally.
  * You need slow call detection: calls that succeed but take too long
    can trip the circuit just like failures.

## Quick start

    # 1. Add to your supervision tree
    children = [
      {Resiliency.CircuitBreaker, name: MyApp.Breaker, failure_rate_threshold: 0.5}
    ]
    Supervisor.start_link(children, strategy: :one_for_one)

    # 2. Use it
    case Resiliency.CircuitBreaker.call(MyApp.Breaker, fn -> HttpClient.get(url) end) do
      {:ok, response} -> handle_response(response)
      {:error, :circuit_open} -> {:error, :service_degraded}
      {:error, reason} -> {:error, reason}
    end

## How it works

The circuit breaker runs as a `GenServer` that maintains state and failure
rates. The protected function runs in the **caller's process**, not inside
the GenServer. This means:

  * The GenServer is never blocked by slow downstream calls.
  * A crash in the protected function does not crash the GenServer.
  * Permission checks are synchronous (`GenServer.call`), but recording
    outcomes is asynchronous (`GenServer.cast`) for minimal overhead.

## States

The circuit breaker has three states:

  * **`:closed`** -- Normal operation. Calls are allowed through. Outcomes
    are recorded in a count-based sliding window. When the failure rate
    (or slow call rate) exceeds the configured threshold and the minimum
    number of calls has been reached, the circuit transitions to `:open`.

  * **`:open`** -- Calls are rejected immediately with `{:error, :circuit_open}`.
    After `open_timeout` milliseconds, the circuit transitions to `:half_open`.

  * **`:half_open`** -- A limited number of probe calls are allowed through
    (controlled by `permitted_calls_in_half_open`). If the probes succeed
    (failure rate stays below threshold), the circuit transitions back to
    `:closed`. If any probe fails above the threshold, it transitions back
    to `:open`.

## Return values

| Function | Success | Circuit open | fn crashes |
|---|---|---|---|
| `call/2,3` | `{:ok, result}` | `{:error, :circuit_open}` | `{:error, reason}` |
| `allow/1` | `{:ok, record_fn}` | `{:error, :circuit_open}` | N/A |

## Error handling

If the function raises, exits, or throws, the error is caught and returned
as `{:error, reason}`. The outcome is classified using `should_record` and
recorded to the sliding window.

    {:error, {%RuntimeError{message: "boom"}, _stacktrace}} =
      Resiliency.CircuitBreaker.call(MyApp.Breaker, fn -> raise "boom" end)

## Algorithm Complexity

| Function | Time | Space |
|---|---|---|
| `call/2,3` | O(1) GenServer call + O(1) GenServer cast + O(f) function | O(w) — sliding window of size w |
| `allow/1` | O(1) GenServer call | O(1) |
| `get_state/1` | O(1) | O(1) |
| `get_stats/1` | O(1) | O(1) |
| `reset/1` | O(w) — reallocates window | O(w) |
| `force_open/1` | O(1) | O(1) |
| `force_close/1` | O(w) — resets window | O(w) |

## Telemetry

Call events are emitted in the **caller's process**. The `state_change` event is emitted
inside the **GenServer process** (state transitions happen asynchronously in `handle_cast`
/ `handle_info` callbacks). See `Resiliency.Telemetry` for the complete event catalogue.

### `[:resiliency, :circuit_breaker, :call, :start]`

Emitted at the beginning of every `call/2,3` invocation, before the permission check.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `system_time` | `integer` | `System.system_time()` at emission time |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `name` | `term` | The circuit breaker name passed to `call/2,3` |

### `[:resiliency, :circuit_breaker, :call, :rejected]`

Emitted when the circuit is open and the call is rejected without executing the function.
Always followed immediately by a `:stop` event.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| _(none)_ | | |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `name` | `term` | The circuit breaker name |

### `[:resiliency, :circuit_breaker, :call, :stop]`

Emitted after every `call/2,3` completes — whether permitted, rejected, successful, or failed.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| `duration` | `integer` | Elapsed native time units (`System.monotonic_time/0` delta) |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `name` | `term` | The circuit breaker name |
| `result` | `:ok | :error` | `:ok` on success, `:error` on failure or rejection |
| `error` | `term | nil` | The error reason, `:circuit_open` if rejected, `nil` on success |

### `[:resiliency, :circuit_breaker, :state_change]`

Emitted inside the GenServer process when the circuit transitions between states.

> #### GenServer process {: .warning}
>
> This event is emitted from within the circuit breaker GenServer, not the caller's process.
> Slow telemetry handlers attached to this event will block the GenServer's message loop.
> Keep handlers fast.

**Measurements**

| Key | Type | Description |
|-----|------|-------------|
| _(none)_ | | |

**Metadata**

| Key | Type | Description |
|-----|------|-------------|
| `name` | `term` | The circuit breaker name |
| `from` | `:closed | :open | :half_open` | Previous state |
| `to` | `:closed | :open | :half_open` | New state |

# `name`

```elixir
@type name() :: GenServer.server()
```

A circuit breaker reference — a registered name, PID, or `{:via, ...}` tuple.

# `allow`

```elixir
@spec allow(name()) :: {:ok, (atom() -&gt; :ok)} | {:error, :circuit_open}
```

Two-step API: check permission and get a recording function.

This is useful when you cannot wrap the operation in a zero-arity function
(e.g., the work spans multiple process messages or external systems).

If the circuit allows the call, returns `{:ok, record_fn}` where `record_fn`
is a function that accepts `:success`, `:failure`, or `:ignore` and records
the outcome. The record function is **one-shot** — only the first call takes
effect; subsequent calls are silent no-ops.

If the caller process dies before calling `record_fn`, the permit is
automatically released as a `:failure`. This prevents permits from being
permanently leaked when callers crash.

## Parameters

* `name` -- the name or PID of a running circuit breaker.

## Returns

`{:ok, record_fn}` when the circuit allows the call, or
`{:error, :circuit_open}` when the circuit is open.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :allow_cb)
    iex> {:ok, record} = Resiliency.CircuitBreaker.allow(:allow_cb)
    iex> record.(:success)
    :ok

# `call`

```elixir
@spec call(name(), (-&gt; result), keyword()) :: {:ok, result} | {:error, term()}
when result: term()
```

Executes `fun` through the circuit breaker.

If the circuit is `:closed` or `:half_open` (with available permits), the
function runs in the caller's process. The result is classified using the
configured `should_record` function and recorded asynchronously.

If the circuit is `:open`, returns `{:error, :circuit_open}` immediately
without executing `fun`.

## Parameters

* `name` -- the name or PID of a running circuit breaker.
* `fun` -- a zero-arity function to execute.
* `opts` -- optional keyword list. Currently unused, reserved for future options.

## Returns

`{:ok, result}` on success, `{:error, :circuit_open}` when the circuit is
open, or `{:error, reason}` if the function raises, exits, or throws.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :call_cb)
    iex> Resiliency.CircuitBreaker.call(:call_cb, fn -> {:ok, 42} end)
    {:ok, {:ok, 42}}

# `child_spec`

Returns a child specification for starting under a supervisor.

## Options

* `:name` -- (required) the name to register the circuit breaker under.
* `:window_size` -- number of call outcomes in the sliding window. Default `100`.
* `:failure_rate_threshold` -- failure rate (0.0–1.0) that trips the circuit. Default `0.5`.
* `:slow_call_threshold` -- duration in ms above which a call is "slow". Default `:infinity` (disabled).
* `:slow_call_rate_threshold` -- slow call rate (0.0–1.0) that trips the circuit. Default `1.0`.
* `:open_timeout` -- ms before `:open` → `:half_open`. Default `60_000`.
* `:permitted_calls_in_half_open` -- probe calls allowed in `:half_open`. Default `1`.
* `:minimum_calls` -- min recorded calls before evaluating failure rate. Default `10`.
* `:should_record` -- `fn result -> :success | :failure | :ignore` classification function.
* `:on_state_change` -- `fn name, from_state, to_state -> any` callback.

## Examples

    children = [
      {Resiliency.CircuitBreaker, name: MyApp.Breaker, failure_rate_threshold: 0.5}
    ]
    Supervisor.start_link(children, strategy: :one_for_one)

    iex> spec = Resiliency.CircuitBreaker.child_spec(name: :my_cb)
    iex> spec.id
    {Resiliency.CircuitBreaker, :my_cb}

# `force_close`

```elixir
@spec force_close(name()) :: :ok
```

Forces the circuit to the `:closed` state.

Resets the sliding window and clears any timers.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :fc_cb)
    iex> Resiliency.CircuitBreaker.force_open(:fc_cb)
    :ok
    iex> Resiliency.CircuitBreaker.force_close(:fc_cb)
    :ok
    iex> Resiliency.CircuitBreaker.get_state(:fc_cb)
    :closed

# `force_open`

```elixir
@spec force_open(name()) :: :ok
```

Forces the circuit to the `:open` state.

The circuit stays open until `reset/1` or `force_close/1` is called.
No automatic `:open` → `:half_open` timer is started.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :fo_cb)
    iex> Resiliency.CircuitBreaker.force_open(:fo_cb)
    :ok
    iex> Resiliency.CircuitBreaker.get_state(:fo_cb)
    :open

# `get_state`

```elixir
@spec get_state(name()) :: :closed | :open | :half_open
```

Returns the current state of the circuit breaker.

## Returns

`:closed`, `:open`, or `:half_open`.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :state_cb)
    iex> Resiliency.CircuitBreaker.get_state(:state_cb)
    :closed

# `get_stats`

```elixir
@spec get_stats(name()) :: map()
```

Returns statistics about the circuit breaker.

## Returns

A map containing:
* `:state` -- the current state (`:closed`, `:open`, or `:half_open`)
* `:total` -- total recorded calls in the sliding window
* `:failures` -- number of recorded failures
* `:slow_calls` -- number of recorded slow calls
* `:failure_rate` -- current failure rate (0.0–1.0)
* `:slow_call_rate` -- current slow call rate (0.0–1.0)

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :stats_cb)
    iex> stats = Resiliency.CircuitBreaker.get_stats(:stats_cb)
    iex> stats.state
    :closed
    iex> stats.total
    0

# `reset`

```elixir
@spec reset(name()) :: :ok
```

Resets the circuit breaker to its initial state.

Clears the sliding window, cancels any open timeout timer, and transitions
to `:closed`.

## Examples

    iex> {:ok, _pid} = Resiliency.CircuitBreaker.start_link(name: :reset_cb)
    iex> Resiliency.CircuitBreaker.reset(:reset_cb)
    :ok

# `start_link`

```elixir
@spec start_link(keyword()) :: GenServer.on_start()
```

Starts a circuit breaker linked to the current process.

Typically you'd use `child_spec/1` instead to start under a supervisor.
See `child_spec/1` for options.

## Examples

    {:ok, pid} = Resiliency.CircuitBreaker.start_link(name: MyApp.Breaker)

---

*Consult [api-reference.md](api-reference.md) for complete listing*
