Getting Started

View Source

This guide walks through making your first requests after completing Installation.

Your first request

Send a non-streaming request to any configured provider:

curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": [
      {"role": "user", "content": "Explain the BEAM in one sentence."}
    ]
  }'

Response:

{
  "id": "01950000-0000-0000-0000-000000000000",
  "object": "response",
  "model": "gpt-4o",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "The BEAM is..."}],
      "status": "completed"
    }
  ],
  "usage": {}
}

Streaming responses

Add "stream": true to receive Server-Sent Events as the model generates:

curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "input": [
      {"role": "user", "content": "Write a haiku about Elixir."}
    ]
  }'

You'll receive a stream of events:

event: response.created
data: {"id":"01950000...","status":"queued",...}

event: response.in_progress
data: {"type":"response.in_progress","sequence_number":0}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Concurrent","sequence_number":3}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" streams flow","sequence_number":4}

event: response.completed
data: {"id":"01950000...","status":"completed",...}

data: [DONE]

See Streaming for the full event catalogue and client examples.

Using tools

Define tools in the request and the model will call them when appropriate:

curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "input": [
      {"role": "user", "content": "What time is it in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_time",
        "description": "Get the current time in a given timezone",
        "parameters": {
          "type": "object",
          "properties": {
            "timezone": {"type": "string", "description": "IANA timezone name"}
          },
          "required": ["timezone"]
        }
      }
    ]
  }'

When the model decides to call get_time, OpenResponses emits a function_call item in the output. You then submit the result in a follow-up request using previous_response_id. See Tool Dispatch for the full flow.

Multi-turn conversations

Use previous_response_id to continue a conversation. OpenResponses automatically reconstructs the full context from the cache:

# First turn
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": [{"role": "user", "content": "My name is Alice."}]
  }'
# → {"id": "resp_001", ...}

# Second turn — no need to repeat the history
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "previous_response_id": "resp_001",
    "input": [{"role": "user", "content": "What is my name?"}]
  }'

See Conversation History for caching behaviour and TTL configuration.

Choosing a model

The model name determines which provider adapter is used:

Model prefixProvider
gpt-*OpenAI
claude-*Anthropic
gemini-*Google Gemini
llama*, mistral*, phi*, qwen*Ollama (local)

See Providers to add API keys and customise routing.