Streaming
View SourceOpenResponses supports Server-Sent Events (SSE) streaming out of the box. When "stream": true is included in a request, the response is delivered as a sequence of events rather than a single JSON object.
Enabling streaming
{
"model": "gpt-4o",
"stream": true,
"input": [{"role": "user", "content": "Tell me about the BEAM."}]
}The response is Content-Type: text/event-stream with one event per line pair.
Event format
Each event follows the SSE format:
event: <event-type>
data: <json-payload>
Every event payload includes a sequence_number — a monotonically increasing integer you can use to detect gaps or re-order out-of-order events.
Event catalogue
Lifecycle events
| Event | When |
|---|---|
response.created | Immediately after the request is accepted. Contains the initial response object. |
response.in_progress | The provider has begun generating. |
response.completed | All output items are complete and the response has reached a terminal state. Contains the final response object. |
response.failed | An error occurred. Contains an error object. |
response.incomplete | The token budget (max_output_tokens) was exhausted before the model finished. |
Output item events
| Event | When |
|---|---|
response.output_item.added | A new output item (message, function call, reasoning) begins. |
response.output_item.done | An output item is complete. |
Text delta events
| Event | When |
|---|---|
response.content_part.added | A new content part within a message begins. |
response.output_text.delta | A chunk of text from the model. The delta field contains the new text. |
response.output_text.done | A text content part is complete. The text field contains the full assembled text. |
response.content_part.done | A content part is complete. |
Tool call events
| Event | When |
|---|---|
response.function_call_arguments.delta | A chunk of JSON arguments for a function call. |
response.function_call_arguments.done | The function call arguments are complete. |
Sequence numbers
Every event includes "sequence_number": N. Numbers are assigned by the loop process and increment by one per event. You can use them to:
- Detect dropped events (gap in sequence)
- Re-order events if your client receives them out of order
- Resume a stream by requesting events after a known sequence number (future feature)
A complete streaming session
event: response.created
data: {"id":"resp_01","object":"response","model":"gpt-4o","status":"queued","sequence_number":0}
event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1}
event: response.output_item.added
data: {"type":"response.output_item.added","item":{"id":"msg_01","type":"message","role":"assistant","content":[],"status":"in_progress"},"sequence_number":2}
event: response.content_part.added
data: {"type":"response.content_part.added","item_id":"msg_01","part":{"type":"output_text","text":""},"sequence_number":3}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_01","delta":"The BEAM","sequence_number":4}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_01","delta":" is a virtual machine","sequence_number":5}
event: response.output_text.done
data: {"type":"response.output_text.done","item_id":"msg_01","text":"The BEAM is a virtual machine","sequence_number":6}
event: response.content_part.done
data: {"type":"response.content_part.done","item_id":"msg_01","sequence_number":7}
event: response.output_item.done
data: {"type":"response.output_item.done","item":{"id":"msg_01","status":"completed"},"sequence_number":8}
event: response.completed
data: {"id":"resp_01","status":"completed","output":[...],"sequence_number":9}
data: [DONE]Client examples
JavaScript (browser)
const response = await fetch('/v1/responses', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
model: 'gpt-4o',
stream: true,
input: [{role: 'user', content: 'Hello'}]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const {value, done} = await reader.read();
if (done) break;
buffer += decoder.decode(value, {stream: true});
const lines = buffer.split('\n\n');
buffer = lines.pop();
for (const chunk of lines) {
const dataLine = chunk.split('\n').find(l => l.startsWith('data: '));
if (!dataLine || dataLine === 'data: [DONE]') continue;
const event = JSON.parse(dataLine.slice(6));
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
}
}Elixir (server-to-server)
{:ok, response} = Req.post("http://localhost:4000/v1/responses",
json: %{model: "gpt-4o", stream: true, input: [%{role: "user", content: "Hello"}]},
into: fn {:data, chunk}, acc ->
chunk
|> String.split("\n\n", trim: true)
|> Enum.each(fn event_str ->
case String.split(event_str, "data: ", parts: 2) do
[_, "[DONE]"] -> :ok
[_, json] ->
event = Jason.decode!(json)
if event["type"] == "response.output_text.delta" do
IO.write(event["delta"])
end
_ -> :ok
end
end)
{:cont, acc}
end
)Non-streaming mode
Without "stream": true, OpenResponses waits for the loop to complete and returns the full response object in one HTTP response. This is simpler for short interactions but adds latency for long generations.
{
"model": "gpt-4o",
"input": [{"role": "user", "content": "What is 2+2?"}]
}The default timeout is 30 seconds. Long-running loops should use streaming.