Scaling Open Responses

Single node (current default)

Out of the box, Open Responses runs on a single node and handles hundreds of concurrent users without any configuration changes. The BEAM scheduler distributes work across all CPU cores; each request runs as an isolated GenServer process under LoopSupervisor. Thousands of concurrent streaming SSE connections are normal workload for a single Bandit/Cowboy node.

The practical limit on a single node is memory. Each active response holds its output in ETS and in the Loop process heap. A well-specced node (8 cores, 16 GB RAM) comfortably handles 1,000–5,000 simultaneous active streams, with the true bottleneck being LLM provider latency rather than BEAM throughput.

What does not survive a single-node setup:

Node restart drops all in-flight responses (ETS is in-memory)
No horizontal scaling — a second node has no access to responses created on the first

For development and small production deployments, this is fine.

Stage 1: Durable storage (AshPostgres)

The first production upgrade is swapping the Response resource from ETS to Postgres. This is one line change:

# lib/open_responses/responses/response.ex
use Ash.Resource,
  domain: OpenResponses.Responses,
  data_layer: AshPostgres.DataLayer,   # replaces Ash.DataLayer.Ets
  extensions: [AshStateMachine]

Add to mix.exs:

{:ash_postgres, "~> 2.0"}

Generate and run the migration:

mix ash_postgres.generate_migrations
mix ecto.migrate

Nothing else changes. All actions, state machine transitions, and streaming behaviour are identical. Responses now survive restarts and are visible to every node.

Stage 2: Multi-node clustering

With Postgres in place, add libcluster to form an Erlang cluster across nodes. Phoenix.PubSub (which broadcasts SSE events) uses the pg adapter and works across a cluster automatically — no changes needed.

# mix.exs
{:libcluster, "~> 3.3"}

# config/runtime.exs
config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Gossip   # or Kubernetes, DNS, EPMD
    ]
  ]

# lib/open_responses/application.ex
{Cluster.Supervisor, [Application.get_env(:libcluster, :topologies), [name: OpenResponses.ClusterSupervisor]]}

With clustering active, PubSub events broadcast cluster-wide. A response created on node A fires SSE events that node B's connected client receives correctly.

The remaining gap is previous_response_id context: when a multi-turn request lands on node B, it queries ResponseCache (Cachex), which lives only on the node that handled the first turn.

Stage 3: Distributed Loop registry (Horde)

This is the approach that eliminates sticky sessions entirely, using the same strategy deployed in large-scale Elixir medical and telco systems.

The idea: instead of routing requests to nodes, register Loop processes in a cluster-wide registry. Any node can look up or start a process for a given response_id, regardless of which node originally created it. Horde implements this registry using CRDTs (Conflict-free Replicated Data Types) — the registry state converges automatically across nodes with no single point of coordination.

# mix.exs
{:horde, "~> 0.9"}

Replace LoopSupervisor with a Horde supervisor and registry:

# lib/open_responses/application.ex
children = [
  {Horde.Registry, [name: OpenResponses.LoopRegistry, keys: :unique, members: :auto]},
  {Horde.DynamicSupervisor, [name: OpenResponses.LoopSupervisor, strategy: :one_for_one, members: :auto]},
  ...
]

# lib/open_responses/loop.ex
def init(opts) do
  response = Keyword.fetch!(opts, :response)
  {:ok, _} = Horde.Registry.register(OpenResponses.LoopRegistry, response.id, self())
  ...
end

Start loops via the Horde supervisor:

Horde.DynamicSupervisor.start_child(
  OpenResponses.LoopSupervisor,
  {OpenResponses.Loop, opts}
)

Look up a running loop from any node:

case Horde.Registry.lookup(OpenResponses.LoopRegistry, response_id) do
  [{pid, _}] -> send(pid, :cancel)
  [] -> :not_found
end

With Horde in place:

A streaming client connected to node A receives events from a Loop running on node B transparently, via cluster-wide PubSub
A previous_response_id request landing on any node reconstructs context from Postgres (Stage 1), so Cachex is no longer the bottleneck
No sticky sessions required at the load balancer — round-robin or least-connections works correctly

Stage 4: Consistent hashing (alternative to Horde)

If you prefer to keep process affinity simple without a distributed registry, consistent hashing routes requests deterministically to the same node based on a key — typically the response_id or a user_id. The hash ring recalculates minimally when nodes join or leave, unlike naive modulo hashing.

# mix.exs
{:libring, "~> 1.6"}

ring = HashRing.new()
|> HashRing.add_node("node1@host")
|> HashRing.add_node("node2@host")
|> HashRing.add_node("node3@host")

target_node = HashRing.key_to_node(ring, response_id)

Requests for the same response_id always hash to the same node, so that node's Cachex cache always has the relevant context. When a new node joins, only ~1/N of keys rehash — existing sessions are unaffected.

The tradeoff versus Horde: simpler mental model, but requires either a smart load balancer that respects the hash ring or an application-layer proxy hop when the receiving node isn't the target node. Horde's registry eliminates that proxy hop by making every process findable from everywhere.

Comparison

Approach	Sticky sessions needed	Survives restarts	Works multi-node	Complexity
ETS (default)	N/A (single node)	No	No	None
AshPostgres only	Yes	Yes	Partial	Low
AshPostgres + libcluster	Yes	Yes	Yes (with stickiness)	Low
AshPostgres + Horde	No	Yes	Yes	Medium
AshPostgres + consistent hashing	At LB only	Yes	Yes	Medium

For most deployments, Stage 1 (Postgres) + Stage 2 (libcluster) + sticky sessions at the load balancer is the pragmatic path. It requires no application code changes beyond the data layer swap and a cluster topology config.

Horde becomes worth the added complexity when you need zero-downtime rolling deploys, automatic process migration on node failure, or truly stateless load balancing.

Kubernetes deployment note

On Kubernetes, the DNS clustering strategy in libcluster discovers pods automatically:

config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Kubernetes.DNS,
      config: [
        service: "open-responses-headless",
        application_name: "open_responses"
      ]
    ]
  ]

Sticky sessions are available via sessionAffinity: ClientIP on a Kubernetes Service, or via a cookie-based affinity annotation in ingress-nginx. With Horde, skip the affinity entirely and let the scheduler round-robin freely.

← Previous Page Configuration Reference

Next Page → Deploying