Getting Started
View SourceThis guide walks through making your first requests after completing Installation.
Your first request
Send a non-streaming request to any configured provider:
curl -X POST http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": [
{"role": "user", "content": "Explain the BEAM in one sentence."}
]
}'
Response:
{
"id": "01950000-0000-0000-0000-000000000000",
"object": "response",
"model": "gpt-4o",
"status": "completed",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "The BEAM is..."}],
"status": "completed"
}
],
"usage": {}
}Streaming responses
Add "stream": true to receive Server-Sent Events as the model generates:
curl -X POST http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "gpt-4o",
"stream": true,
"input": [
{"role": "user", "content": "Write a haiku about Elixir."}
]
}'
You'll receive a stream of events:
event: response.created
data: {"id":"01950000...","status":"queued",...}
event: response.in_progress
data: {"type":"response.in_progress","sequence_number":0}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Concurrent","sequence_number":3}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":" streams flow","sequence_number":4}
event: response.completed
data: {"id":"01950000...","status":"completed",...}
data: [DONE]See Streaming for the full event catalogue and client examples.
Using tools
Define tools in the request and the model will call them when appropriate:
curl -X POST http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"input": [
{"role": "user", "content": "What time is it in Tokyo?"}
],
"tools": [
{
"type": "function",
"name": "get_time",
"description": "Get the current time in a given timezone",
"parameters": {
"type": "object",
"properties": {
"timezone": {"type": "string", "description": "IANA timezone name"}
},
"required": ["timezone"]
}
}
]
}'
When the model decides to call get_time, OpenResponses emits a function_call item in the output. You then submit the result in a follow-up request using previous_response_id. See Tool Dispatch for the full flow.
Multi-turn conversations
Use previous_response_id to continue a conversation. OpenResponses automatically reconstructs the full context from the cache:
# First turn
curl -X POST http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": [{"role": "user", "content": "My name is Alice."}]
}'
# → {"id": "resp_001", ...}
# Second turn — no need to repeat the history
curl -X POST http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"previous_response_id": "resp_001",
"input": [{"role": "user", "content": "What is my name?"}]
}'
See Conversation History for caching behaviour and TTL configuration.
Choosing a model
The model name determines which provider adapter is used:
| Model prefix | Provider |
|---|---|
gpt-* | OpenAI |
claude-* | Anthropic |
gemini-* | Google Gemini |
llama*, mistral*, phi*, qwen* | Ollama (local) |
See Providers to add API keys and customise routing.