Jido Composer provides two HITL mechanisms: HumanNode for workflow gates and tool approval gates for orchestrator tools. Both use the same ApprovalRequest/ApprovalResponse protocol and the generalized suspension system.

HumanNode in Workflows

HumanNode pauses a workflow at a specific state for a human decision. It always returns {:ok, context, :suspend} — suspension is not an error, it's a normal step in the flow.

nodes: %{
  process: ProcessAction,
  approval: %Jido.Composer.Node.HumanNode{
    name: "deploy_approval",
    description: "Approve production deployment",
    prompt: "Deploy to production?",
    allowed_responses: [:approved, :rejected],
    timeout: 300_000
  },
  deploy: DeployAction
},
transitions: %{
  {:process, :ok}        => :approval,
  {:approval, :approved} => :deploy,
  {:approval, :rejected} => :failed,
  {:approval, :timeout}  => :failed,
  {:deploy, :ok}         => :done,
  {:_, :error}           => :failed
}

When the workflow reaches the approval state:

  1. HumanNode evaluates the prompt and builds an ApprovalRequest
  2. Returns {:ok, context, :suspend} — the strategy recognizes :suspend as reserved
  3. The strategy emits a Suspend directive with the embedded ApprovalRequest
  4. Your runtime delivers the request to the human (via your notification system)

HumanNode Fields

FieldTypeDefaultDescription
namestringrequiredNode identifier
descriptionstringrequiredWhat this approval is for
promptstring | functionrequiredQuestion for the human
allowed_responses[atom][:approved, :rejected]Valid response options
response_schemakeywordnilSchema for structured response data
context_keys[atom] | nilnil (all)Which context keys to show the human
timeoutms | :infinity:infinityDecision deadline
timeout_outcomeatom:timeoutOutcome when timeout expires
metadatamap%{}Arbitrary metadata for notifications

Dynamic Prompts

The prompt field can be a function that receives the current context, enabling context-aware questions:

%Jido.Composer.Node.HumanNode{
  name: "deploy_approval",
  description: "Approve deployment",
  prompt: fn context ->
    version = get_in(context, [:build, :version])
    env = get_in(context, [:config, :environment])
    "Deploy version #{version} to #{env}?"
  end,
  allowed_responses: [:approved, :rejected]
}

Tool Approval Gates in Orchestrators

Mark individual tools as requiring human approval before execution:

use Jido.Composer.Orchestrator,
  nodes: [
    SearchAction,
    {DeployAction, requires_approval: true},
    {DeleteAction, requires_approval: true}
  ]

When the LLM calls a gated tool, the orchestrator:

  1. Partitions tool calls into gated and ungated
  2. Executes ungated tools immediately
  3. Suspends with an ApprovalRequest for each gated tool
  4. Waits for human approval before executing

Ungated sibling tools execute concurrently while the gated tool waits for approval.

Dynamic Approval Policy

Beyond static requires_approval, you can provide a dynamic policy function:

use Jido.Composer.Orchestrator,
  nodes: [SearchAction, DeployAction, DeleteAction],
  approval_policy: fn tool_call, context ->
    cond do
      tool_call.name == "deploy" and context[:env] == :prod ->
        {:require_approval, %{reason: "Production deployment"}}
      tool_call.name == "delete" ->
        {:require_approval, %{reason: "Destructive operation"}}
      true ->
        :proceed
    end
  end

Advisory vs Enforcement

MechanismWho triggersEnforcementPurpose
HumanNode as orchestrator toolLLM decides to call itAdvisoryLLM asks for help when uncertain
Approval gate (requires_approval)Strategy enforcesMandatoryDangerous tools require pre-execution approval

HumanNode as an orchestrator tool is advisory — the LLM chooses when to ask. Approval gates are enforcement — the strategy always gates execution regardless of what the LLM wants.

Rejection Handling

When a human rejects a gated tool, the orchestrator injects a synthetic rejection result into the LLM conversation:

Tool result for "deploy": REJECTED by human reviewer. Reason: "Too risky."
Choose a different approach.

The LLM then adapts its strategy based on the rejection.

Rejection policy controls what happens to sibling tool calls:

  • :continue_siblings (default) — Other tools finish normally; all results (including rejection) go to the LLM

ApprovalRequest & ApprovalResponse

These serializable structs correlate pending decisions by a unique id.

ApprovalRequest

Built by HumanNode (or the strategy for gated tools), enriched by the strategy with agent context:

FieldSourceDescription
idHumanNodeUnique request ID for correlation
promptHumanNodeHuman-readable question
visible_contextHumanNodeContext subset the human sees
allowed_responsesHumanNodeValid outcome atoms
response_schemaHumanNodeSchema for structured input
timeoutHumanNodeMax wait time (ms)
timeout_outcomeHumanNodeOutcome when timeout fires
metadataHumanNodeArbitrary notification metadata
agent_idStrategySuspended agent's ID
agent_moduleStrategySuspended agent's module
workflow_stateStrategyFSM state name (workflows only)
tool_callStrategyTriggering tool call (orchestrators only)
node_nameStrategyHumanNode or gated node name

ApprovalResponse

Submitted by external code to resume the flow:

FieldDescription
request_idMust match ApprovalRequest.id
decisionOne of allowed_responses atoms
dataStructured input matching response_schema
respondentWho responded (opaque — email, user ID, etc.)
commentOptional free-text comment

Lifecycle

sequenceDiagram
    participant Strategy
    participant HumanNode
    participant Runtime
    participant Human

    Strategy->>HumanNode: dispatch
    HumanNode-->>Strategy: {:ok, ctx, :suspend}
    Strategy->>Runtime: Suspend directive + ApprovalRequest
    Runtime->>Human: Notification (your delivery system)
    Human->>Runtime: Decision + optional data
    Runtime->>Strategy: cmd(:suspend_resume, response_data)
    Strategy->>Strategy: Validate, merge, transition

Validation on Resume

When a response arrives, the strategy validates:

  1. request_id matches the pending ApprovalRequest.id
  2. decision is in allowed_responses
  3. data conforms to response_schema (if defined)

The decision atom becomes the transition outcome (e.g., :approved triggers {:approval, :approved} => :deploy).

Resuming a Suspended Flow

{:ok, response} = Jido.Composer.HITL.ApprovalResponse.new(
  request_id: approval_request.id,
  decision: :approved,
  respondent: "admin@company.com",
  comment: "Ship it!"
)

{agent, directives} = MyWorkflow.cmd(agent, {
  :suspend_resume,
  %{suspension_id: suspension.id, response_data: Map.from_struct(response)}
})

Generalized Suspension

Suspension extends beyond HITL. Any flow can pause for five reason types:

ReasonUse Case
:human_inputApproval gates, manual review
:rate_limitAPI throttling, backoff
:async_completionWaiting for external async result
:external_jobLong-running batch job
:customApplication-specific reasons
%Jido.Composer.Suspension{
  id: "susp-123",
  reason: :rate_limit,
  timeout: 60_000,
  timeout_outcome: :timeout,
  resume_signal: "rate_limit_cleared",
  metadata: %{retry_after: 60}
}

Resume API

Jido.Composer.Resume.resume(agent, suspension_id, resume_data,
  deliver_fn: &MyApp.deliver_signal/2
)

Options:

  • deliver_fn (required) — (agent, signal) -> {agent, directives}
  • thaw_fn (optional) — (agent_id) -> {:ok, agent} for restoring from storage
  • storage (optional) — Performs CAS (compare-and-swap) on checkpoint status for idempotency

Persistence

Long-running flows can be checkpointed, hibernated, and resumed across process restarts.

stateDiagram-v2
    [*] --> Running
    Running --> Suspended : suspend directive
    Suspended --> Hibernated : checkpoint + stop
    Hibernated --> Resumed : thaw + resume
    Resumed --> Running : replay in-flight ops
    Running --> [*] : completed

Resource Management Tiers

TierTriggerProcess aliveResume latency
Live waitSuspension startsYesInstant
Hibernate intentSuspend.hibernate = trueYesInstant
Full checkpointSuspension timeout >= hibernate_afterNoThaw + start

Checkpoint

Jido.Composer.Checkpoint.prepare_for_checkpoint/1 strips non-serializable data (closures, PIDs) and marks the state as :hibernated:

checkpoint_state = Jido.Composer.Checkpoint.prepare_for_checkpoint(strategy_state)
# Serialize and store checkpoint_state

Thaw

Restore runtime configuration from the DSL definition:

restored = Jido.Composer.Checkpoint.reattach_runtime_config(checkpoint_state, strategy_opts)

Resume

Replay in-flight operations and respawn paused children:

directives = Jido.Composer.Checkpoint.replay_directives(restored_state)
child_spawns = Jido.Composer.Checkpoint.pending_child_respawns(restored_state)

ChildRef

Jido.Composer.ChildRef is a serializable reference to a child process (no PIDs). It tracks the child's module, ID, phase, and status for reliable checkpoint/resume.

FieldDescription
agent_moduleChild's module (for re-spawning)
agent_idChild's unique ID
tagTag for parent-child tracking
checkpoint_keyStorage key for child's checkpoint
suspension_idLinks to Suspension causing pause
status:running, :paused, :hibernated, :completed, :failed

Idempotent Resume

Checkpoints track status to prevent duplicate resumes:

StatusMeaningTransition
:hibernatedAvailable for resume-> :resuming
:resumingCurrently being restored-> :resumed
:resumedAlready restoredReject further resume

Schema Migration

Checkpoints include a schema version (current: :composer_v1). Checkpoint.migrate/2 handles upgrades from older versions.

Nesting and Persistence

When nested agents are checkpointed:

  1. Inside-out: Child hits hibernate threshold -> checkpoints -> emits composer.child.hibernated -> parent marks child as :paused -> parent checkpoints when its own threshold fires
  2. Top-down resume: Thaw outermost agent -> re-spawn children from checkpoint_key -> children get fresh PIDs -> deliver resume signal to innermost suspended agent -> results propagate up

Parent agents don't know their children are suspended (isolation property). Each level independently manages its own checkpoint/resume cycle.