Worker Pools

View Source

After: You can run concurrent work safely without melting the BEAM.

# Before: spawn a new agent per request (expensive initialization)
{:ok, pid} = Jido.start_agent(jido, SearchAgent)
result = AgentServer.call(pid, signal)
Jido.stop_agent(jido, agent_id)  # Teardown overhead

# After: checkout from pre-warmed pool (sub-millisecond)
{:ok, result} = Jido.Agent.WorkerPool.call(MyApp.Jido, :search, signal)

When to Use Pools

Use worker pools when:

  • Agent initialization is expensive (loading models, establishing connections)
  • You need bounded concurrency for resource-limited operations
  • You want consistent latency under load (no cold starts)

Use spawn-per-request when:

  • Agents need per-request state isolation
  • Initialization is cheap
  • Request volume is unpredictable and bursty

Configuration

Configure pools in your Jido instance:

# lib/my_app/application.ex
children = [
  {Jido,
   name: MyApp.Jido,
   agent_pools: [
     {:fast_search, MyApp.Agents.SearchAgent, size: 8, max_overflow: 4},
     {:planner, MyApp.Agents.PlannerAgent, size: 4, strategy: :fifo}
   ]}
]

Or via config:

# config/config.exs
config :my_app, MyApp.Jido,
  agent_pools: [
    {:fast_search, MyApp.Agents.SearchAgent, size: 8, max_overflow: 4},
    {:planner, MyApp.Agents.PlannerAgent, size: 4, strategy: :fifo}
  ]

Pool Options

OptionDefaultDescription
:size5Fixed number of pre-warmed agents
:max_overflow0Maximum temporary workers when pool exhausted
:strategy:lifoCheckout order: :lifo or :fifo
:worker_opts[]Options passed to Jido.AgentServer.start_link/1

Strategy choice:

  • :lifo (default) — Most recently used agent. Better cache locality, agents stay "warm"
  • :fifo — Round-robin. Even load distribution across workers

API Reference

Transaction-style checkout/checkin. The safest way to use pooled agents:

Jido.Agent.WorkerPool.with_agent(MyApp.Jido, :fast_search, fn pid ->
  signal = Signal.new!("search", %{query: "elixir pools"}, source: "/api")
  {:ok, agent} = Jido.AgentServer.call(pid, signal)
  agent.state.results
end)

Multiple operations on the same agent:

Jido.Agent.WorkerPool.with_agent(MyApp.Jido, :planner, fn pid ->
  {:ok, _} = Jido.AgentServer.call(pid, setup_signal)
  {:ok, agent} = Jido.AgentServer.call(pid, execute_signal)
  agent.state.plan
end, timeout: 10_000)

call/4

Send a single signal and wait for result:

signal = Signal.new!("search", %{query: "poolboy"}, source: "/api")
{:ok, agent} = Jido.Agent.WorkerPool.call(MyApp.Jido, :fast_search, signal)
agent.state.results

Options:

OptionDefaultDescription
:timeout5000Pool checkout timeout (ms)
:call_timeout5000Signal processing timeout (ms)

cast/4

Fire-and-forget signal (agent checked in immediately):

signal = Signal.new!("index", %{doc: document}, source: "/worker")
:ok = Jido.Agent.WorkerPool.cast(MyApp.Jido, :indexer, signal)

Warning: The agent is returned to the pool before processing completes. Use call/4 if you need the result.

status/2

Inspect pool status for monitoring:

status = Jido.Agent.WorkerPool.status(MyApp.Jido, :fast_search)
# => %{state: :ready, available: 5, overflow: 0, checked_out: 3}
FieldDescription
:statePool state (:ready, :full, :overflow)
:availableWorkers waiting for checkout
:overflowCurrently active overflow workers
:checked_outWorkers currently in use

checkout/3 and checkin/3 (Low-Level)

Manual checkout/checkin. Not recommended—use with_agent/4 instead.

pid = Jido.Agent.WorkerPool.checkout(MyApp.Jido, :fast_search)
try do
  Jido.AgentServer.call(pid, signal)
after
  Jido.Agent.WorkerPool.checkin(MyApp.Jido, :fast_search, pid)
end

If you forget to checkin, the worker is leaked until process death.

State Semantics Warning

Pooled agents are long-lived stateful workers. State persists across checkouts:

# First checkout: counter = 0 → 1
Jido.Agent.WorkerPool.with_agent(jido, :counter_pool, fn pid ->
  Jido.AgentServer.call(pid, increment_signal)
end)

# Second checkout (same worker): counter = 1 → 2
Jido.Agent.WorkerPool.with_agent(jido, :counter_pool, fn pid ->
  {:ok, agent} = Jido.AgentServer.call(pid, increment_signal)
  agent.state.counter  # => 2, not 1!
end)

Design patterns for per-request isolation:

  1. Stateless agents: Store only cached/shared data in agent state; pass request data via signal
  2. Reset action: Call a "reset" signal at the start of each transaction
  3. Request-scoped state: Use worker_opts to configure how state resets
# Pattern 1: Stateless design - pass everything via signal
defmodule SearchAction do
  use Jido.Action, name: "search", schema: [query: [type: :string, required: true]]

  def run(%{query: query}, context) do
    # Use cached connection from agent state
    conn = context.state.connection
    results = do_search(conn, query)
    {:ok, %{last_results: results}}  # Only store for debugging
  end
end

Pool Sizing Guidelines

Size Calculation

Start with:

size = expected_concurrent_requests × average_request_duration / 1000

Example: 100 req/sec with 50ms average → 100 × 0.05 = 5 workers minimum.

Overflow Strategy

Patternmax_overflowUse Case
Strict limit0Rate limiting, resource protection
Burst buffersize × 0.5Handle traffic spikes
Elasticsize × 2Unknown load, prioritize availability

Environment-Based Sizing

# config/runtime.exs
import Config

pool_size = 
  case config_env() do
    :prod -> String.to_integer(System.get_env("SEARCH_POOL_SIZE", "16"))
    :test -> 2
    :dev -> 4
  end

config :my_app, MyApp.Jido,
  agent_pools: [
    {:search, MyApp.SearchAgent, size: pool_size, max_overflow: div(pool_size, 2)}
  ]

Timeout Configuration

Three timeout boundaries:

# 1. Pool checkout timeout: waiting for available worker
Jido.Agent.WorkerPool.call(jido, :pool, signal, timeout: 5_000)

# 2. Call timeout: signal processing within agent
Jido.Agent.WorkerPool.call(jido, :pool, signal, call_timeout: 30_000)

# 3. Combined example
Jido.Agent.WorkerPool.call(jido, :pool, signal,
  timeout: 2_000,       # Fast fail if pool exhausted
  call_timeout: 60_000  # Long timeout for expensive operation
)

When checkout times out, you get a {:noproc, _} error from poolboy.

Instrumentation

Telemetry Integration

Attach handlers for pool metrics:

defmodule MyApp.PoolMetrics do
  def setup do
    :telemetry.attach_many(
      "pool-metrics",
      [
        [:jido, :agent, :call, :start],
        [:jido, :agent, :call, :stop],
        [:jido, :agent, :call, :exception]
      ],
      &handle_event/4,
      nil
    )
  end

  def handle_event([:jido, :agent, :call, :stop], measurements, metadata, _config) do
    duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)
    
    :telemetry.execute(
      [:my_app, :pool, :call],
      %{duration_ms: duration_ms},
      %{pool: metadata.pool_name, success: metadata.success}
    )
  end
end

Status Polling

Periodic health checks:

defmodule MyApp.PoolMonitor do
  use GenServer

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def init(opts) do
    jido = Keyword.fetch!(opts, :jido)
    pools = Keyword.fetch!(opts, :pools)
    schedule_check()
    {:ok, %{jido: jido, pools: pools}}
  end

  def handle_info(:check, state) do
    for pool <- state.pools do
      status = Jido.Agent.WorkerPool.status(state.jido, pool)
      
      if status.available == 0 and status.overflow > 0 do
        Logger.warning("Pool #{pool} exhausted, using overflow workers",
          pool: pool,
          overflow: status.overflow
        )
      end
    end
    
    schedule_check()
    {:noreply, state}
  end

  defp schedule_check, do: Process.send_after(self(), :check, 5_000)
end

Example: Pool-Backed URL Fetcher

Complete example with a pool for HTTP requests:

defmodule MyApp.FetchAction do
  use Jido.Action,
    name: "fetch",
    schema: [
      url: [type: :string, required: true],
      timeout: [type: :integer, default: 5000]
    ]

  def run(%{url: url, timeout: timeout}, context) do
    # Use persistent HTTP client from agent state
    client = context.state.http_client
    
    case Req.get(client, url: url, receive_timeout: timeout) do
      {:ok, %{status: 200, body: body}} ->
        {:ok, %{last_fetch: %{url: url, body: body, fetched_at: DateTime.utc_now()}}}
      
      {:ok, %{status: status}} ->
        {:error, {:http_error, status}}
      
      {:error, reason} ->
        {:error, {:fetch_failed, reason}}
    end
  end
end

defmodule MyApp.FetcherAgent do
  use Jido.Agent,
    name: "fetcher",
    schema: [
      http_client: [type: :any, required: true],
      last_fetch: [type: :map, default: nil]
    ]

  def signal_routes do
    [{"fetch", MyApp.FetchAction}]
  end
end

# Configuration
children = [
  {Jido,
   name: MyApp.Jido,
   agent_pools: [
     {:fetcher, MyApp.FetcherAgent,
      size: 10,
      max_overflow: 5,
      worker_opts: [
        initial_state: %{http_client: Req.new(retry: false)}
      ]}
   ]}
]

# Usage
defmodule MyApp.Crawler do
  alias Jido.Agent.WorkerPool
  alias Jido.Signal

  def fetch_urls(urls) do
    urls
    |> Task.async_stream(fn url ->
      signal = Signal.new!("fetch", %{url: url}, source: "/crawler")
      
      case WorkerPool.call(MyApp.Jido, :fetcher, signal, call_timeout: 10_000) do
        {:ok, agent} -> {:ok, url, agent.state.last_fetch}
        {:error, reason} -> {:error, url, reason}
      end
    end, max_concurrency: 20, timeout: 15_000)
    |> Enum.to_list()
  end
end

Common Patterns

Bounded Concurrency

Limit concurrent access to a scarce resource:

# Pool size = number of database connections
agent_pools: [
  {:db_writer, MyApp.DbWriterAgent, size: 5, max_overflow: 0}
]

# Callers block when all 5 connections busy
WorkerPool.call(jido, :db_writer, write_signal, timeout: 30_000)

Backpressure

Fail fast when pool exhausted instead of queueing:

case WorkerPool.call(jido, :processor, signal, timeout: 100) do
  {:ok, result} -> 
    {:ok, result}
  
  {:error, {:timeout, _}} -> 
    {:error, :service_overloaded}
end

Or use non-blocking checkout:

case Jido.Agent.WorkerPool.checkout(jido, :processor, block: false) do
  :full -> 
    {:error, :pool_exhausted}
  
  pid ->
    try do
      Jido.AgentServer.call(pid, signal)
    after
      Jido.Agent.WorkerPool.checkin(jido, :processor, pid)
    end
end

Warm Pool Pattern

Pre-warm agents with expensive initialization:

defmodule MyApp.MLAgent do
  use Jido.Agent,
    name: "ml_agent",
    schema: [
      model: [type: :any, required: true]
    ]

  # Model loaded once at pool startup, reused across requests
end

agent_pools: [
  {:ml, MyApp.MLAgent,
   size: 4,
   max_overflow: 0,  # Never cold-start; all workers pre-warmed
   worker_opts: [
     initial_state: %{model: MyApp.ML.load_model!()}
   ]}
]

Pools vs Spawn-Per-Request

AspectWorker PoolSpawn-Per-Request
LatencyConsistent (no cold start)Variable (init overhead)
StateShared across requestsIsolated per request
MemoryFixed (pool size)Scales with load
FailureWorker restarted, pool recoversIsolated failure
ConcurrencyBounded by pool sizeUnbounded (dangerous)

Choose pools for:

  • Database connections
  • HTTP clients with keep-alive
  • ML model inference
  • Rate-limited external APIs

Choose spawn-per-request for:

  • User-specific agents with personalized state
  • One-shot workflows
  • Testing (isolated state)