Scaling Open Responses

View Source

Single node (current default)

Out of the box, Open Responses runs on a single node and handles hundreds of concurrent users without any configuration changes. The BEAM scheduler distributes work across all CPU cores; each request runs as an isolated GenServer process under LoopSupervisor. Thousands of concurrent streaming SSE connections are normal workload for a single Bandit/Cowboy node.

The practical limit on a single node is memory. Each active response holds its output in ETS and in the Loop process heap. A well-specced node (8 cores, 16 GB RAM) comfortably handles 1,000–5,000 simultaneous active streams, with the true bottleneck being LLM provider latency rather than BEAM throughput.

What does not survive a single-node setup:

  • Node restart drops all in-flight responses (ETS is in-memory)
  • No horizontal scaling — a second node has no access to responses created on the first

For development and small production deployments, this is fine.


Stage 1: Durable storage (AshPostgres)

The first production upgrade is swapping the Response resource from ETS to Postgres. This is one line change:

# lib/open_responses/responses/response.ex
use Ash.Resource,
  domain: OpenResponses.Responses,
  data_layer: AshPostgres.DataLayer,   # replaces Ash.DataLayer.Ets
  extensions: [AshStateMachine]

Add to mix.exs:

{:ash_postgres, "~> 2.0"}

Generate and run the migration:

mix ash_postgres.generate_migrations
mix ecto.migrate

Nothing else changes. All actions, state machine transitions, and streaming behaviour are identical. Responses now survive restarts and are visible to every node.


Stage 2: Multi-node clustering

With Postgres in place, add libcluster to form an Erlang cluster across nodes. Phoenix.PubSub (which broadcasts SSE events) uses the pg adapter and works across a cluster automatically — no changes needed.

# mix.exs
{:libcluster, "~> 3.3"}
# config/runtime.exs
config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Gossip   # or Kubernetes, DNS, EPMD
    ]
  ]
# lib/open_responses/application.ex
{Cluster.Supervisor, [Application.get_env(:libcluster, :topologies), [name: OpenResponses.ClusterSupervisor]]}

With clustering active, PubSub events broadcast cluster-wide. A response created on node A fires SSE events that node B's connected client receives correctly.

The remaining gap is previous_response_id context: when a multi-turn request lands on node B, it queries ResponseCache (Cachex), which lives only on the node that handled the first turn.


Stage 3: Distributed Loop registry (Horde)

This is the approach that eliminates sticky sessions entirely, using the same strategy deployed in large-scale Elixir medical and telco systems.

The idea: instead of routing requests to nodes, register Loop processes in a cluster-wide registry. Any node can look up or start a process for a given response_id, regardless of which node originally created it. Horde implements this registry using CRDTs (Conflict-free Replicated Data Types) — the registry state converges automatically across nodes with no single point of coordination.

# mix.exs
{:horde, "~> 0.9"}

Replace LoopSupervisor with a Horde supervisor and registry:

# lib/open_responses/application.ex
children = [
  {Horde.Registry, [name: OpenResponses.LoopRegistry, keys: :unique, members: :auto]},
  {Horde.DynamicSupervisor, [name: OpenResponses.LoopSupervisor, strategy: :one_for_one, members: :auto]},
  ...
]

Register each Loop process under its response_id:

# lib/open_responses/loop.ex
def init(opts) do
  response = Keyword.fetch!(opts, :response)
  {:ok, _} = Horde.Registry.register(OpenResponses.LoopRegistry, response.id, self())
  ...
end

Start loops via the Horde supervisor:

Horde.DynamicSupervisor.start_child(
  OpenResponses.LoopSupervisor,
  {OpenResponses.Loop, opts}
)

Look up a running loop from any node:

case Horde.Registry.lookup(OpenResponses.LoopRegistry, response_id) do
  [{pid, _}] -> send(pid, :cancel)
  [] -> :not_found
end

With Horde in place:

  • A streaming client connected to node A receives events from a Loop running on node B transparently, via cluster-wide PubSub
  • A previous_response_id request landing on any node reconstructs context from Postgres (Stage 1), so Cachex is no longer the bottleneck
  • No sticky sessions required at the load balancer — round-robin or least-connections works correctly

Stage 4: Consistent hashing (alternative to Horde)

If you prefer to keep process affinity simple without a distributed registry, consistent hashing routes requests deterministically to the same node based on a key — typically the response_id or a user_id. The hash ring recalculates minimally when nodes join or leave, unlike naive modulo hashing.

# mix.exs
{:libring, "~> 1.6"}
ring = HashRing.new()
|> HashRing.add_node("node1@host")
|> HashRing.add_node("node2@host")
|> HashRing.add_node("node3@host")

target_node = HashRing.key_to_node(ring, response_id)

Requests for the same response_id always hash to the same node, so that node's Cachex cache always has the relevant context. When a new node joins, only ~1/N of keys rehash — existing sessions are unaffected.

The tradeoff versus Horde: simpler mental model, but requires either a smart load balancer that respects the hash ring or an application-layer proxy hop when the receiving node isn't the target node. Horde's registry eliminates that proxy hop by making every process findable from everywhere.


Comparison

ApproachSticky sessions neededSurvives restartsWorks multi-nodeComplexity
ETS (default)N/A (single node)NoNoNone
AshPostgres onlyYesYesPartialLow
AshPostgres + libclusterYesYesYes (with stickiness)Low
AshPostgres + HordeNoYesYesMedium
AshPostgres + consistent hashingAt LB onlyYesYesMedium

For most deployments, Stage 1 (Postgres) + Stage 2 (libcluster) + sticky sessions at the load balancer is the pragmatic path. It requires no application code changes beyond the data layer swap and a cluster topology config.

Horde becomes worth the added complexity when you need zero-downtime rolling deploys, automatic process migration on node failure, or truly stateless load balancing.


Kubernetes deployment note

On Kubernetes, the DNS clustering strategy in libcluster discovers pods automatically:

config :libcluster,
  topologies: [
    open_responses: [
      strategy: Cluster.Strategy.Kubernetes.DNS,
      config: [
        service: "open-responses-headless",
        application_name: "open_responses"
      ]
    ]
  ]

Sticky sessions are available via sessionAffinity: ClientIP on a Kubernetes Service, or via a cookie-based affinity annotation in ingress-nginx. With Horde, skip the affinity entirely and let the scheduler round-robin freely.