Conversation History

View Source

OpenResponses maintains conversation history automatically using previous_response_id. You never need to replay prior messages — just reference the last response ID and OpenResponses reconstructs the full context.

How it works

When a response completes, OpenResponses stores it in ResponseCache (backed by Cachex). On the next request, if previous_response_id is present, the loop loads the prior response and prepends its input and output to the new request's input before sending to the provider.

Request 2: previous_response_id = "resp_01"
                                  
                    
                    
              ResponseCache.get("resp_01")
                    
                    
        prev.input + prev.output + new_input
                    
                    
             sent to provider

Basic usage

# Turn 1
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "input": [{"role": "user", "content": "My favourite language is Elixir."}]
  }'
# → {"id": "resp_abc", "status": "completed", ...}

# Turn 2 — no history needed in the request
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "previous_response_id": "resp_abc",
    "input": [{"role": "user", "content": "What is my favourite language?"}]
  }'
# → Model knows it's Elixir

Cache configuration

By default, responses are cached for 24 hours in memory. To change the TTL, responses are stored via Cachex which you can configure at startup:

# application.ex
{Cachex, name: :response_cache, limit: 10_000}

For cross-node or cross-restart persistence (Phase 3), add AshPostgres as a data layer and responses will be stored durably.

What gets cached

For each completed response, the cache stores:

  • id — the response ID
  • model — the model used
  • status — terminal state (completed, failed, or incomplete)
  • input — the original input sent by the client
  • output — all output items produced by the model
  • usage — token counts
  • created_at — timestamp

Responses in failed or incomplete states are cached but their output may be partial.

Chaining multiple turns

Each turn only needs to reference the immediately preceding response — not the entire chain. OpenResponses handles the reconstruction:

resp_001  resp_002  resp_003  resp_004 (current)

When processing resp_004, OpenResponses loads resp_003 from cache. resp_003's own context was already reconstructed when it was created, so its input field contains the full accumulated history up to that point.

Branching conversations

Because previous_response_id is just a reference, you can branch at any point:

resp_001
   resp_002a (branch A)
        resp_003a
   resp_002b (branch B)
         resp_003b

Both branches reference resp_001 but diverge from there. This is useful for showing users alternative continuations or implementing undo.