# Architecture

Hephaestus is a workflow engine for Elixir built around a pure functional core
and pluggable runtime adapters. Workflows are directed acyclic graphs (DAGs) of
steps, validated at compile time, executed at runtime through adapter contracts.
This guide explains the internal architecture and the reasoning behind it.

## Design philosophy

Three principles shape the design:

1. **Pure core, effectful shell.** The engine that advances workflow state is a
   pure function — no GenServers, no storage calls, no side effects. Side effects
   live in runtime adapters that wrap the core.

2. **Fail at compile time.** The workflow macro extracts the step graph from
   `start/0` and `transit/3` at compilation, then validates it with `libgraph`.
   Cycles, unreachable steps, missing terminal nodes, event mismatches, and
   context key collisions are all compile errors — not runtime surprises.

3. **Adapter contracts, not implementations.** Storage and execution are
   behaviours. The core library ships ETS and local OTP defaults so you can
   develop and test without external dependencies. Production adapters (Ecto,
   Oban) live in separate packages and plug into the same contracts.

## Layer diagram

```
+---------------------------------------------------------------+
|                       Consumer Application                    |
|   use Hephaestus, storage: ..., runner: ...                   |
+---------------------------------------------------------------+
          |                                         |
          v                                         v
+-------------------+                     +-------------------+
|    Runner          |                     |    Storage         |
|    (behaviour)     |                     |    (behaviour)     |
+-------------------+                     +-------------------+
| Runner.Local       |                     | Storage.ETS       |
| (GenServer/OTP)    |                     | (GenServer/ETS)   |
+-------------------+                     +-------------------+
          |                                         |
          +------------------+----------------------+
                             |
                             v
               +---------------------------+
               |     Hephaestus.Core       |
               |  Engine  |  Instance      |
               |  Context |  Workflow      |
               |  ExecutionEntry           |
               +---------------------------+
                             |
                             v
               +---------------------------+
               |     Step / Connector      |
               |     (behaviours)          |
               +---------------------------+
```

The **Core** layer is pure Elixir structs and functions — no processes, no I/O.
The **Runtime** layer provides the OTP wiring and persistence. Consumer
applications compose them through `use Hephaestus`.

## Core modules

### Uniqueness (`Hephaestus.Uniqueness`)

Handles composite ID construction, validation, and uniqueness checking for
workflow instances. Every workflow declares a mandatory business key via the
`unique` option, and this module encapsulates the logic around it.

Key responsibilities:

- **`build_id/2`** — constructs a composite ID in `"key::value"` format from
  the workflow's `%Unique{}` config and the caller-provided value.
- **`build_id_with_suffix/2`** — same format with a random suffix, used when
  `scope: :none` (e.g., `"userid::abc123::a1b2c3d4"`).
- **`validate_value!/1`** — ensures the caller value is `[a-z0-9]+` or a valid UUID.
- **`check/5`** — queries storage to enforce uniqueness within the configured scope.
  Returns `:ok` or `{:error, :already_running}`.
- **`extract_value/1`** — extracts the raw business value from a composite ID.

The composite ID format uses `::` as separator: `"key::value"`. The key portion
is `[a-z0-9]+` (set at compile time), and the value is `[a-z0-9]+` or a UUID.

### Instance (`Hephaestus.Core.Instance`)

The central data structure. An Instance is a snapshot of a workflow execution at
a point in time:

| Field               | Type                  | Purpose                                     |
|---------------------|-----------------------|---------------------------------------------|
| `id`                | string                | Composite business ID (`"key::value"`)       |
| `workflow`          | module                | The workflow module being executed            |
| `workflow_version`  | positive integer      | Resolved workflow definition version          |
| `status`            | atom                  | Lifecycle state (see below)                  |
| `current_step`      | module or nil         | The step currently being processed            |
| `context`           | `Context.t()`         | Initial data + accumulated step results       |
| `step_configs`      | map                   | Per-step config overrides                     |
| `active_steps`      | MapSet                | Steps currently executing (supports parallel) |
| `completed_steps`   | MapSet                | Steps that have finished                      |
| `runtime_metadata`  | map                   | Dynamic metadata emitted by steps             |
| `telemetry_metadata` | map                  | Caller metadata merged into telemetry events  |
| `telemetry_start_time` | integer or nil     | Monotonic start time for duration telemetry   |
| `execution_history` | list of ExecutionEntry| Audit trail                                  |

Instances are created via `Instance.new/4`, which requires the workflow module,
resolved version, initial context, and an explicit composite ID (built by
`Hephaestus.Uniqueness`). Auto-generated UUIDs are no longer used. Instances are
plain structs — the Instance module has no process or side effect.

### Context (`Hephaestus.Core.Context`)

The execution context has two namespaced maps:

- **`initial`** — immutable data provided at workflow start (e.g., `%{order_id: 123}`).
- **`steps`** — results accumulated from completed steps, keyed by step ref.

Namespacing matters for fan-in: when parallel branches converge, each step
writes to its own key, avoiding conflicts.

```elixir
context.initial.order_id       #=> 123
context.steps.validate.valid   #=> true
context.steps.charge.amount    #=> 4999
```

### Engine (`Hephaestus.Core.Engine`)

The pure functional heart. Every function takes an Instance struct and returns an
updated Instance struct. No GenServer, no storage, no side effects.

Key operations:

- **`advance/1`** — moves the instance forward. If `:pending`, activates the
  start step. If active steps remain, returns as-is. If no active steps and not
  waiting, marks `:completed`.

- **`execute_step/2`** — calls `step_module.execute/3` with the instance,
  config, and context. Returns the step's result tuple.

- **`complete_step/4`** — moves a step from `active_steps` to `completed_steps`,
  merges context updates, cleans up config.

- **`activate_transitions/3`** — resolves the workflow's `transit/3` for the
  completed step and event, then activates target steps. Supports single targets,
  `{target, config}` tuples, and lists (fan-out). Applies **join semantics**:
  a step is only activated if all its predecessors have completed.

- **`resume_step/3`** — completes a waiting async step, activates its
  transitions, and sets status back to `:running`.

Because the engine is pure, it is trivial to test: build an Instance, call
engine functions, assert on the returned struct. No mocks needed.

### Workflow (`Hephaestus.Core.Workflow` + `Hephaestus.Workflow`)

The behaviour module defines the contract (`start/0`, `transit/3`). The macro
module (`Hephaestus.Workflow`) handles compile-time extraction and validation.

When you write:

```elixir
defmodule MyApp.Workflows.OrderFlow do
  use Hephaestus.Workflow,
    unique: [key: "orderid"]

  def start, do: ValidateOrder

  def transit(ValidateOrder, :valid, _ctx), do: ChargePayment
  def transit(ValidateOrder, :invalid, _ctx), do: Hephaestus.Steps.Done
  def transit(ChargePayment, :charged, _ctx), do: Hephaestus.Steps.Done
end
```

At compile time, the macro:

1. Extracts `start/0` to find the entry point.
2. Walks all `transit/3` clauses to build an edge list (static clauses are
   extracted from the AST; dynamic clauses use `@targets` annotations).
3. Reads `:tags` and `:metadata` options (if provided), validates them (string
   keys, JSON-safe values), stores as module attributes.
4. Calls `Hephaestus.Core.Workflow.validate!/4` which builds a `libgraph`
   digraph and runs six validations.
5. Validates the mandatory `unique` option and stores the `%Unique{}` struct.
6. Generates `__tags__/0`, `__metadata__/0`, `__unique__/0`,
   `__predecessors__/1`, `__graph__/0`, `__edges__/0`, `__version__/0`,
   `__versioned__?/0`, and `resolve_version/1` into standard workflow modules
   for runtime use. Umbrella version-dispatcher modules additionally generate
   `__versions__/0`, `current_version/0`, `version_for/2`, and facade
   functions (`start/2`, `resume/2`, `get/1`, `list/1`, `cancel/1`).

### ExecutionEntry (`Hephaestus.Core.ExecutionEntry`)

An immutable record appended to the instance's `execution_history` as steps
complete. Contains `step_ref`, `event`, `timestamp`, and optional
`context_updates`. Useful for audit trails and debugging.

## Workflow lifecycle

```
                      Instance.new/4
                            |
                            v
                       +---------+
                       | pending |
                       +----+----+
                            |  Engine.advance/1
                            v
                       +---------+
              +------->| running |<------+
              |        +----+----+       |
              |             |            |
         resume_step   execute steps   activate
         (async done)       |         transitions
              |             v            |
              |     +-------+-------+    |
              |     |               |    |
              |     v               v    |
         +---------+         +----------+---+
         | waiting |         | step returns |
         +---------+         | {:ok, event} |
                             +--------------+
                                    |
                         no active steps remain
                                    |
                                    v
                             +-----------+
                             | completed |
                             +-----------+

         (any step returns {:error, _})
                       |
                       v
                  +--------+
                  | failed |
                  +--------+
```

**Status transitions:**

| From      | To        | Trigger                                    |
|-----------|-----------|--------------------------------------------|
| pending   | running   | `Engine.advance/1` activates the start step |
| running   | running   | Step completes, transitions activate more   |
| running   | waiting   | Step returns `{:async}`                     |
| running   | completed | No active steps remain                      |
| running   | failed    | Step returns `{:error, reason}`             |
| waiting   | running   | `Engine.resume_step/3` with external event  |

## Compile-time DAG validation

`Hephaestus.Core.Workflow.validate!/4` runs six checks at compile time:

1. **Acyclic** — the graph must be a DAG. Cycles are always a compile error.
2. **Reachable** — every step must be reachable from `start/0`. Orphaned steps
   are a compile error.
3. **Leaf termination** — every leaf node (step with no outgoing edges) must be
   `Hephaestus.Steps.Done`. Paths that dead-end elsewhere are rejected.
4. **Fan-out convergence** — when a step fans out to multiple parallel branches,
   those branches must converge at a common join point before `Done`.
5. **Context key uniqueness** — no two steps may resolve to the same context key
   (derived from the module name or `step_key/0`). Collisions would silently
   overwrite data.
6. **Event consistency** — every event declared in a step's `events/0` must have
   a matching `transit/3` clause, and every event used in `transit/3` must be
   declared in the step's `events/0`. No orphaned events, no undeclared
   transitions.

These checks catch entire classes of bugs at build time rather than in
production.

## Runner adapter pattern

The `Hephaestus.Runtime.Runner` behaviour defines three callbacks:

- `start_instance/3` — create and begin executing a workflow.
- `resume/2` — deliver an event to a waiting instance.
- `schedule_resume/3` — schedule a delayed `:timeout` resume for a step.

### Local runner (`Hephaestus.Runtime.Runner.Local`)

The built-in implementation. One GenServer per workflow instance, started as a
transient child under a `DynamicSupervisor`. The execution loop:

1. `init/1` — recovers the latest persisted state from storage (crash recovery).
   If the instance is already completed or failed, the process stops immediately.
2. `{:continue, :advance}` — calls `Engine.advance/1`. Based on the result:
   - **completed** → persist and stop.
   - **waiting** → persist and park (no further continues).
   - **running with active steps** → persist and continue to `:execute_active`.
3. `{:continue, :execute_active}` — fans out all active steps via
   `Task.Supervisor.async_nolink/2`, awaits results, then reduces them through
   `Engine.complete_step/4` and `Engine.activate_transitions/3`. Steps execute
   concurrently within a single advance cycle.
4. **Resume** — `GenServer.cast({:resume, event})` calls `Engine.resume_step/3`
   and re-enters the advance loop.

The Local runner is suitable for development, testing, and single-node
deployments. Timers from `schedule_resume/3` are process-local and do not survive
crashes.

## Storage adapter pattern

The `Hephaestus.Runtime.Storage` behaviour defines four callbacks:

- `get/1` — retrieve an instance by ID.
- `put/1` — persist (upsert) an instance.
- `delete/1` — remove an instance.
- `query/1` — filter instances by status, workflow, etc.

### ETS storage (`Hephaestus.Runtime.Storage.ETS`)

The built-in implementation. A GenServer that owns a named ETS table (`:set`,
`:protected`). All operations go through `GenServer.call/2` to serialize writes
and ensure consistency. Queries do a full table scan with in-memory filtering —
fine for development, not for production scale.

## Step behaviour

Every step implements `Hephaestus.Steps.Step`:

```elixir
@callback events() :: [atom()]
@callback execute(Instance.t(), config(), Context.t()) :: result()

# Optional
@callback step_key() :: atom()
@callback retry_config() :: retry_config()
```

### Return values from `execute/3`

| Return                        | Meaning                                          |
|-------------------------------|--------------------------------------------------|
| `{:ok, event}`                | Synchronous completion, emit event               |
| `{:ok, event, context_updates}` | Synchronous completion with data                |
| `{:async}`                    | Step will complete later (instance enters waiting)|
| `{:error, reason}`            | Step failed                                      |

### Async pattern

When a step returns `{:async}`, the instance enters `:waiting` status and the
runner parks. Some external trigger (webhook, user action, timer) later calls
`resume/2` with an event atom, which flows through `Engine.resume_step/3` to
unblock the workflow.

The `schedule_resume/3` callback supports timer-based async: the runner
automatically delivers a `:timeout` event after a delay.

### Step key resolution

Each step's context results are stored under a key derived from:

1. `step_key/0` if the step implements it (explicit override).
2. Otherwise, the last segment of the module name, underscored
   (e.g., `MyApp.Steps.ValidateOrder` → `:validate_order`).

The compile-time validator ensures no two steps in a workflow share the same key.

## Fan-out and fan-in

Hephaestus supports parallel execution through its transition system:

**Fan-out** — a `transit/3` clause returns a list of targets:

```elixir
def transit(PrepareOrder, :ready, _ctx), do: [ChargePayment, ReserveInventory]
```

Both `ChargePayment` and `ReserveInventory` become active simultaneously. The
Local runner executes them concurrently via `Task.Supervisor`.

**Fan-in (join semantics)** — `Engine.activate_transitions/3` checks
`__predecessors__/1` before activating a step. A step is only activated when
**all** of its predecessors have completed:

```
  PrepareOrder
      |
  :ready (fan-out)
     / \
    v   v
 Charge  Reserve
    \   /
  :charged + :reserved (fan-in)
     \ /
      v
  ShipOrder      <-- only activates when BOTH predecessors complete
```

The predecessor map is computed at compile time from the DAG, so join logic has
zero runtime overhead for graph traversal.

## Connector pattern

The `Hephaestus.Connectors.Connector` behaviour provides a contract for
external service integrations:

```elixir
@callback execute(action(), params(), config()) :: {:ok, result()} | {:error, reason()}
@callback supported_actions() :: [action()]
```

Connectors are not called by the engine directly — steps use them internally.
This separation keeps the engine pure and makes external dependencies explicit
and testable. A step can inject a connector module via its config, making it
straightforward to swap a real connector for a test double.

## Workflow facade

Umbrella workflow modules (and standalone workflows with `unique`) get
auto-generated facade functions that hide ID construction and instance lookup
from the caller:

| Function | Signature | Description |
|---|---|---|
| `start/2` | `start(value, context)` | Builds the composite ID, checks uniqueness, delegates to `start_instance` |
| `resume/2` | `resume(value, event)` | Builds the composite ID, delegates to `resume` |
| `get/1` | `get(value)` | Builds the composite ID, fetches from storage |
| `list/1` | `list(filters \\ [])` | Queries storage filtered to this workflow |
| `cancel/1` | `cancel(value)` | Builds the composite ID, cancels the instance |

With `scope: :none`, only `start/2` and `list/1` are generated (the others
would be ambiguous with multiple instances sharing the same value).

Facade functions discover the running `MyApp.Hephaestus` module automatically
via `Hephaestus.Instances.lookup!/0`. In multi-instance setups, pass
`hephaestus: MyApp.Hephaestus` in the workflow's `use` options.

## Instance registry (`Hephaestus.Instances`)

`Hephaestus.Instances` is an auto-discovery registry that lets workflow facade
functions find the running Hephaestus module without explicit configuration. It
uses Elixir's `Registry` (same pattern as Oban).

- **`Hephaestus.Instances`** — starts a global `Registry` as part of the
  `hephaestus_core` application. Provides `register/1` and `lookup!/0`.
- **`Hephaestus.Instances.Tracker`** — a GenServer started as a child of each
  `MyApp.Hephaestus` supervision tree. On init, it calls
  `Hephaestus.Instances.register/1` to register the Hephaestus module. When the
  process stops, the Registry cleans up automatically.

`lookup!/0` returns the single registered module. If zero are registered it
raises. If multiple are registered it raises, instructing the caller to pass the
`hephaestus:` option explicitly.

## Supervision tree

When you `use Hephaestus` and add the module to your application supervisor,
it starts:

```
MyApp.Hephaestus (Supervisor, :one_for_one)
  |-- MyApp.Hephaestus.Registry (Registry, :unique)
  |-- MyApp.Hephaestus.DynamicSupervisor (DynamicSupervisor)
  |-- MyApp.Hephaestus.TaskSupervisor (Task.Supervisor)
  |-- MyApp.Hephaestus.Storage (Storage adapter, e.g., ETS)
  |-- Hephaestus.Instances.Tracker (GenServer — registers this module)
```

Each workflow instance is a transient child under the `DynamicSupervisor`,
registered by instance ID in the `Registry`. The `TaskSupervisor` runs
concurrent step executions. The `Instances.Tracker` registers this Hephaestus
module in the global `Hephaestus.Instances` registry for facade discovery.

## Extension adapters

The adapter pattern enables production-grade alternatives that plug into the
same contracts:

- **`hephaestus_ecto`** — a `Storage` adapter that persists instances as JSONB
  in PostgreSQL via Ecto. Enables durable workflows that survive node restarts.
- **`hephaestus_oban`** — a `Runner` adapter that uses Oban workers instead of
  GenServer processes. Brings distributed execution, persistent job queues,
  and advisory-lock-based concurrency control. Uses workflow `__tags__/0` and
  `__metadata__/0` to populate Oban job meta and tags for filtering in Oban Web.

These packages implement the same `Storage` and `Runner` behaviours. No
changes to your workflow definitions or step implementations — swap the adapter
in `use Hephaestus` and the runtime changes underneath. See the
[Extensions guide](extensions.md) for setup details.
