Conversation History

OpenResponses maintains conversation history automatically using previous_response_id. You never need to replay prior messages — just reference the last response ID and OpenResponses reconstructs the full context.

How it works

When a response completes, OpenResponses stores it in ResponseCache (backed by Cachex). On the next request, if previous_response_id is present, the loop loads the prior response and prepends its input and output to the new request's input before sending to the provider.

Request 2: previous_response_id = "resp_01"
                                  │
                    ┌─────────────┘
                    ▼
              ResponseCache.get("resp_01")
                    │
                    ▼
        prev.input + prev.output + new_input
                    │
                    ▼
             sent to provider

Basic usage

# Turn 1
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "input": [{"role": "user", "content": "My favourite language is Elixir."}]
  }'
# → {"id": "resp_abc", "status": "completed", ...}

# Turn 2 — no history needed in the request
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "previous_response_id": "resp_abc",
    "input": [{"role": "user", "content": "What is my favourite language?"}]
  }'
# → Model knows it's Elixir

Cache configuration

By default, responses are cached for 24 hours in memory. To change the TTL, responses are stored via Cachex which you can configure at startup:

# application.ex
{Cachex, name: :response_cache, limit: 10_000}

For cross-node or cross-restart persistence (Phase 3), add AshPostgres as a data layer and responses will be stored durably.

What gets cached

For each completed response, the cache stores:

id — the response ID
model — the model used
status — terminal state (completed, failed, or incomplete)
input — the original input sent by the client
output — all output items produced by the model
usage — token counts
created_at — timestamp

Responses in failed or incomplete states are cached but their output may be partial.

Chaining multiple turns

Each turn only needs to reference the immediately preceding response — not the entire chain. OpenResponses handles the reconstruction:

resp_001 ← resp_002 ← resp_003 ← resp_004 (current)

When processing resp_004, OpenResponses loads resp_003 from cache. resp_003's own context was already reconstructed when it was created, so its input field contains the full accumulated history up to that point.

Branching conversations

Because previous_response_id is just a reference, you can branch at any point:

resp_001
  ├── resp_002a (branch A)
  │     └── resp_003a
  └── resp_002b (branch B)
        └── resp_003b

Both branches reference resp_001 but diverge from there. This is useful for showing users alternative continuations or implementing undo.

← Previous Page Tool Dispatch

Next Page → Middleware