# `Tinkex.Recovery.Executor`
[🔗](https://github.com/North-Shore-AI/tinkex/blob/v0.4.0/lib/tinkex/recovery/executor.ex#L1)

GenServer that performs recovery attempts for corrupted training runs.

Users must start and drive this module explicitly (typically alongside
`Tinkex.Recovery.Monitor`). Concurrency is capped (default: 1) to avoid
unbounded restarts; adjust via `:max_concurrent` in `start_link/1`.

Telemetry events:
  * `[:tinkex, :recovery, :started]` - attempt began (measurements: `%{attempt: n}`)
  * `[:tinkex, :recovery, :checkpoint_selected]` - checkpoint chosen
  * `[:tinkex, :recovery, :client_created]` - training client successfully created
  * `[:tinkex, :recovery, :completed]` - recovery finished successfully
  * `[:tinkex, :recovery, :failed]` - attempt failed (metadata includes `:error`)
  * `[:tinkex, :recovery, :exhausted]` - max attempts reached, no recovery

# `option`

```elixir
@type option() ::
  {:rest_module, module()}
  | {:service_client_module, module()}
  | {:max_concurrent, pos_integer()}
  | {:send_after, (term(), non_neg_integer() -&gt; reference())}
```

# `child_spec`

Returns a specification to start this module under a supervisor.

See `Supervisor`.

# `recover`

```elixir
@spec recover(pid(), String.t(), pid(), Tinkex.Recovery.Policy.t() | map(), keyword()) ::
  :ok | {:error, term()}
```

Enqueue a recovery request.

Options:
  * `:config` - `Tinkex.Config.t()` used for REST lookups when checkpoint is not provided
  * `:metadata` - map propagated to telemetry/callbacks (e.g., `%{training_pid: pid}`)
  * `:last_checkpoint` - `Tinkex.Types.Checkpoint.t()`/map/string path to skip refetch
  * `:run` - `Tinkex.Types.TrainingRun.t()` to reuse an already fetched run

# `start_link`

```elixir
@spec start_link([option()]) :: GenServer.on_start()
```

Start the executor.

---

*Consult [api-reference.md](api-reference.md) for complete listing*