# `Tinkex.Recovery.Monitor`
[🔗](https://github.com/North-Shore-AI/tinkex/blob/v0.4.0/lib/tinkex/recovery/monitor.ex#L1)

Polls training runs for corruption flags and dispatches recovery work.

This GenServer must be started explicitly and configured with a recovery
policy (disabled by default), a REST module for polling, and an executor pid.

Telemetry events:
  * `[:tinkex, :recovery, :detected]` - observed `corrupted: true` on a run
  * `[:tinkex, :recovery, :poll_error]` - REST poll failed (metadata includes `:error`)

# `option`

```elixir
@type option() ::
  {:policy, Tinkex.Recovery.Policy.t() | map() | nil}
  | {:config, Tinkex.Config.t()}
  | {:rest_module, module()}
  | {:rest_client_fun,
     (pid() -&gt; {:ok, %{config: Tinkex.Config.t()}} | {:error, term()})}
  | {:service_client_module, module()}
  | {:executor, pid()}
  | {:send_after, (term(), non_neg_integer() -&gt; reference())}
```

# `state`

```elixir
@type state() :: %{
  policy: Tinkex.Recovery.Policy.t(),
  rest_module: module(),
  rest_client_fun: (pid() -&gt;
                      {:ok, %{config: Tinkex.Config.t()}} | {:error, term()}),
  service_module: module(),
  executor: pid() | nil,
  runs: %{
    optional(String.t()) =&gt; %{
      service_pid: pid(),
      config: Tinkex.Config.t(),
      metadata: map()
    }
  },
  poll_ref: reference() | nil,
  send_after: (term(), non_neg_integer() -&gt; reference())
}
```

# `child_spec`

Returns a specification to start this module under a supervisor.

See `Supervisor`.

# `monitor_run`

```elixir
@spec monitor_run(pid(), String.t(), pid(), map()) :: :ok | {:error, term()}
```

Begin monitoring a training run.

`service_pid` is the `Tinkex.ServiceClient` pid used to create recovery clients.

# `start_link`

```elixir
@spec start_link([option()]) :: GenServer.on_start()
```

Start the monitor.

# `stop_monitoring`

```elixir
@spec stop_monitoring(pid(), String.t()) :: :ok
```

Stop monitoring a training run.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
