Skip to main content

Streaming Responses

Both the Chat Completions and Responses API endpoints support stream: true. Both use Server-Sent Events (SSE) and conform to their respective OpenAI streaming specs. On top of that baseline, the platform injects additional fields that expose multi-agent context, session identifiers, and platform errors in a way that degrades safely for consumers that don't know about them.

Enabling streaming

Set stream: true in the request body for either endpoint:

{ "model": "agent", "input": "Hello", "stream": true }

The response will have Content-Type: text/event-stream and Transfer-Encoding: chunked. The connection stays open until the agent run terminates or the client disconnects.

Keep-alive

When no event has been emitted for 15 seconds, the platform sends an SSE comment line to prevent the connection from timing out:

:keep-alive

SSE comments (lines starting with :) are ignored by every OpenAI SDK and EventSource implementation. They do not advance sequence numbers.

Chat Completions streaming

The stream is a sequence of data: frames, each containing a JSON-encoded ChatCompletionChunk, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Every chunk validates against openai.types.chat.ChatCompletionChunk.

x_alien extension — Chat Completions

Each chunk may carry an x_alien top-level field. Consumers that don't know about it ignore it safely — the OpenAI SDK models accept unknown top-level keys.

{
"id": "chatcmpl-...",
"object": "chat.completion.chunk",
"choices": [{ "index": 0, "delta": { "content": "Hello" }, "finish_reason": null }],
"x_alien": {
"conversation_id": "550e8400-e29b-41d4-a716-446655440000",
"agent_id": "MAIN",
"agent_register": {
"id": "MAIN",
"kind": "main",
"name": "main",
"parent_id": null
},
"kind": "text"
}
}

x_alien fields:

FieldWhen presentDescription
conversation_idFirst chunk onlyThe session_id for this turn. Capture this and pass it as conversation_id on the next turn to maintain the session.
agent_idAlwaysID of the agent that produced this chunk ("MAIN" for the root agent, node id for subagents).
agent_registerFirst chunk per agentFull identity of this agent: id, kind ("main" / "subagent" / "tool"), name, parent_id, and (for subagents) dispatched_by_tool_call_id. Emitted once per agent per run.
kindAlways"text" or "reasoning". Reasoning chunks contain chain-of-thought traces; use this to render them separately or hide them.
lifecycleAt agent boundaries"agent_end" (root or subagent finished) or "subagent_dispatched" (root dispatched a subtask).
errorOn failure only{ code: string, message: string } on the closing chunk of a failed run.

Mid-stream errors — Chat Completions

If the agent fails after chunks have been emitted, the platform closes with a synthetic final chunk where finish_reason: "stop" and x_alien.error carries the error details, followed by [DONE]. Standard consumers see a clean stop. Extension-aware consumers detect the failure via x_alien.error.

The platform never emits a non-standard finish_reason value — doing so would fail SDK validation against the closed Literal type.

Responses API streaming

The Responses API uses a two-line SSE frame per event: an event: line naming the type, and a data: line with the JSON payload.

note

The actual text tokens arrive in response.output_text.delta events, not in response.created. A typical run emits 8 distinct event types before the terminal response.completed. Consumers that only listen for response.created will receive metadata but no text content.

A typical run emits this event sequence:

event: response.created          ← metadata only, output[] is empty
event: response.output_item.added
event: response.content_part.added
event: response.output_text.delta ← text token (one per chunk)
event: response.output_text.delta ← ...
event: response.output_text.done
event: response.content_part.done
event: response.output_item.done
event: response.completed ← full response object, usage populated

Each frame:

event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_...","status":"in_progress",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":3,"item_id":"...","output_index":0,"content_index":0,"delta":"Hello","logprobs":[]}

event: response.completed
data: {"type":"response.completed","sequence_number":12,"response":{"id":"resp_...","status":"completed",...}}

There is no [DONE] terminator. The stream closes after exactly one terminal event: response.completed, response.failed, or response.incomplete.

Every event validates against openai.types.responses.ResponseStreamEvent.

Sequence numbers

Every event carries a monotonically increasing sequence_number starting at 0. This is the resume cursor. Track the last sequence_number you successfully processed; you need it to reconnect after a network drop.

metadata.x_alien_* extension — Responses API

Platform context is carried in Response.metadata on the response.created and response.completed events. The OpenAI Responses spec documents metadata as an open key/value map — this is an explicit extension point.

{
"type": "response.created",
"sequence_number": 0,
"response": {
"id": "resp_...",
"status": "in_progress",
"metadata": {
"x_alien_root_agent_id": "MAIN",
"x_alien_agent_registry": "[{\"id\":\"MAIN\",\"kind\":\"main\",\"name\":\"main\",\"parent_id\":null},{\"id\":\"subagent-6\",\"kind\":\"subagent\",\"name\":\"Légifrance researcher\",\"parent_id\":\"MAIN\"}]"
}
}
}

metadata keys injected by the platform:

KeyDescription
x_alien_root_agent_idNode id of the root agent in this run (typically "MAIN").
x_alien_agent_registryJSON-encoded array of agent identity objects for every agent in the run. Each entry: { id, kind, name, parent_id }.
x_alien_registry_truncated"true" if the registry was truncated due to the 512-character metadata value limit. Fall back to parsing per-item id prefixes in this case.
x_alien_error_codeSet on response.failed when the underlying error code is platform-specific and cannot be expressed via OpenAI's closed Response.error.code Literal.
x_alien_error_messageHuman-readable companion to x_alien_error_code.

Per-item agent identity

In multi-agent runs, each output item's id field carries a structured prefix that identifies the agent that produced it:

agent:<agent_id>::msg_<random>      ← message item
agent:<agent_id>::fc_<random> ← function call item

Standard consumers treat the id as opaque. Extension-aware consumers parse the prefix to attribute individual output items to specific subagents and render per-agent affordances (collapsed subagent panels, per-agent citations, etc.).

Mid-stream errors — Responses API

If the agent fails after events have been emitted, the platform synthesises a response.failed terminal event:

{
"type": "response.failed",
"sequence_number": 8,
"response": {
"id": "resp_...",
"status": "failed",
"error": { "code": "server_error", "message": "Worker disconnected" },
"metadata": {
"x_alien_error_code": "worker_disconnected",
"x_alien_error_message": "The worker processing this job disconnected unexpectedly."
}
}
}

response.error.code is constrained to OpenAI's documented Literal values. Platform-specific error codes appear in metadata.x_alien_error_code.

Resume after a network drop

This capability is exclusive to the Responses API.

When the SSE connection drops, reconnect by GETting the response with starting_after set to the last sequence number you successfully processed:

GET /agent/:id/responses/<resp_id>?starting_after=<last_seq>

The server replays all events with sequence_number > last_seq. If the run is still in flight, the GET continues live as new events arrive. If it has already terminated, the GET replays the tail and closes with the original terminal event.

Python — resume example

from openai import OpenAI

client = OpenAI(
base_url="https://api.alien.club/agent/<agent_id>",
api_key="<your-access-token>",
)

response_id = "resp_abc"
last_seq = 4 # last sequence_number successfully processed before the drop

# Resume — the SDK transparently issues GET ?starting_after=4
stream = client.responses.retrieve_streaming(
response_id,
starting_after=last_seq,
)

for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)

Storage TTL

Responses are kept in Redis for 24 hours after creation. After expiry, GET returns 410 Gone and resume is not possible. If you need replay beyond 24 hours, store the events client-side as you receive them.

Event types — Responses API

EventWhen
response.createdFirst event. Contains the Response object with status: "in_progress" and initial metadata.
response.in_progressOptional progress signal during long runs.
response.output_item.addedA new top-level output item begins (message, function_call, or reasoning).
response.content_part.addedA new content part begins within a message (output_text, refusal, reasoning_text).
response.output_text.deltaText token delta.
response.output_text.doneCloses an output_text part.
response.function_call_arguments.deltaStreaming JSON fragment for a function call's arguments.
response.function_call_arguments.doneCloses a function call item.
response.reasoning_summary_part.addedA new reasoning summary part begins.
response.reasoning_summary_text.deltaText delta inside a reasoning summary.
response.reasoning_summary_text.doneCloses a reasoning summary part.
response.content_part.doneCloses a content part.
response.output_item.doneCloses a top-level output item.
response.completedTerminal success. Response.usage populated.
response.failedTerminal failure. Response.error populated.
response.incompleteTerminal partial result. Response.incomplete_details.reason populated.

Choosing an API

NeedUse
Broadest client compatibility (LangChain, any OpenAI-compatible proxy)Chat Completions
Resume after network dropResponses API
Typed per-event schemaResponses API
Stateless single-turn queriesEither
Agentic multi-turn conversations in productionResponses API

See also