Streaming Responses
Both the Chat Completions and Responses API endpoints support stream: true. Both use Server-Sent Events (SSE) and conform to their respective OpenAI streaming specs. On top of that baseline, the platform injects additional fields that expose multi-agent context, session identifiers, and platform errors in a way that degrades safely for consumers that don't know about them.
Enabling streaming
Set stream: true in the request body for either endpoint:
{ "model": "agent", "input": "Hello", "stream": true }
The response will have Content-Type: text/event-stream and Transfer-Encoding: chunked. The connection stays open until the agent run terminates or the client disconnects.
Keep-alive
When no event has been emitted for 15 seconds, the platform sends an SSE comment line to prevent the connection from timing out:
:keep-alive
SSE comments (lines starting with :) are ignored by every OpenAI SDK and EventSource implementation. They do not advance sequence numbers.
Chat Completions streaming
The stream is a sequence of data: frames, each containing a JSON-encoded ChatCompletionChunk, terminated by data: [DONE]:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Every chunk validates against openai.types.chat.ChatCompletionChunk.
x_alien extension — Chat Completions
Each chunk may carry an x_alien top-level field. Consumers that don't know about it ignore it safely — the OpenAI SDK models accept unknown top-level keys.
{
"id": "chatcmpl-...",
"object": "chat.completion.chunk",
"choices": [{ "index": 0, "delta": { "content": "Hello" }, "finish_reason": null }],
"x_alien": {
"conversation_id": "550e8400-e29b-41d4-a716-446655440000",
"agent_id": "MAIN",
"agent_register": {
"id": "MAIN",
"kind": "main",
"name": "main",
"parent_id": null
},
"kind": "text"
}
}
x_alien fields:
| Field | When present | Description |
|---|---|---|
conversation_id | First chunk only | The session_id for this turn. Capture this and pass it as conversation_id on the next turn to maintain the session. |
agent_id | Always | ID of the agent that produced this chunk ("MAIN" for the root agent, node id for subagents). |
agent_register | First chunk per agent | Full identity of this agent: id, kind ("main" / "subagent" / "tool"), name, parent_id, and (for subagents) dispatched_by_tool_call_id. Emitted once per agent per run. |
kind | Always | "text" or "reasoning". Reasoning chunks contain chain-of-thought traces; use this to render them separately or hide them. |
lifecycle | At agent boundaries | "agent_end" (root or subagent finished) or "subagent_dispatched" (root dispatched a subtask). |
error | On failure only | { code: string, message: string } on the closing chunk of a failed run. |
Mid-stream errors — Chat Completions
If the agent fails after chunks have been emitted, the platform closes with a synthetic final chunk where finish_reason: "stop" and x_alien.error carries the error details, followed by [DONE]. Standard consumers see a clean stop. Extension-aware consumers detect the failure via x_alien.error.
The platform never emits a non-standard finish_reason value — doing so would fail SDK validation against the closed Literal type.
Responses API streaming
The Responses API uses a two-line SSE frame per event: an event: line naming the type, and a data: line with the JSON payload.
The actual text tokens arrive in response.output_text.delta events, not in response.created. A typical run emits 8 distinct event types before the terminal response.completed. Consumers that only listen for response.created will receive metadata but no text content.
A typical run emits this event sequence:
event: response.created ← metadata only, output[] is empty
event: response.output_item.added
event: response.content_part.added
event: response.output_text.delta ← text token (one per chunk)
event: response.output_text.delta ← ...
event: response.output_text.done
event: response.content_part.done
event: response.output_item.done
event: response.completed ← full response object, usage populated
Each frame:
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_...","status":"in_progress",...}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":3,"item_id":"...","output_index":0,"content_index":0,"delta":"Hello","logprobs":[]}
event: response.completed
data: {"type":"response.completed","sequence_number":12,"response":{"id":"resp_...","status":"completed",...}}
There is no [DONE] terminator. The stream closes after exactly one terminal event: response.completed, response.failed, or response.incomplete.
Every event validates against openai.types.responses.ResponseStreamEvent.
Sequence numbers
Every event carries a monotonically increasing sequence_number starting at 0. This is the resume cursor. Track the last sequence_number you successfully processed; you need it to reconnect after a network drop.
metadata.x_alien_* extension — Responses API
Platform context is carried in Response.metadata on the response.created and response.completed events. The OpenAI Responses spec documents metadata as an open key/value map — this is an explicit extension point.
{
"type": "response.created",
"sequence_number": 0,
"response": {
"id": "resp_...",
"status": "in_progress",
"metadata": {
"x_alien_root_agent_id": "MAIN",
"x_alien_agent_registry": "[{\"id\":\"MAIN\",\"kind\":\"main\",\"name\":\"main\",\"parent_id\":null},{\"id\":\"subagent-6\",\"kind\":\"subagent\",\"name\":\"Légifrance researcher\",\"parent_id\":\"MAIN\"}]"
}
}
}
metadata keys injected by the platform:
| Key | Description |
|---|---|
x_alien_root_agent_id | Node id of the root agent in this run (typically "MAIN"). |
x_alien_agent_registry | JSON-encoded array of agent identity objects for every agent in the run. Each entry: { id, kind, name, parent_id }. |
x_alien_registry_truncated | "true" if the registry was truncated due to the 512-character metadata value limit. Fall back to parsing per-item id prefixes in this case. |
x_alien_error_code | Set on response.failed when the underlying error code is platform-specific and cannot be expressed via OpenAI's closed Response.error.code Literal. |
x_alien_error_message | Human-readable companion to x_alien_error_code. |
Per-item agent identity
In multi-agent runs, each output item's id field carries a structured prefix that identifies the agent that produced it:
agent:<agent_id>::msg_<random> ← message item
agent:<agent_id>::fc_<random> ← function call item
Standard consumers treat the id as opaque. Extension-aware consumers parse the prefix to attribute individual output items to specific subagents and render per-agent affordances (collapsed subagent panels, per-agent citations, etc.).
Mid-stream errors — Responses API
If the agent fails after events have been emitted, the platform synthesises a response.failed terminal event:
{
"type": "response.failed",
"sequence_number": 8,
"response": {
"id": "resp_...",
"status": "failed",
"error": { "code": "server_error", "message": "Worker disconnected" },
"metadata": {
"x_alien_error_code": "worker_disconnected",
"x_alien_error_message": "The worker processing this job disconnected unexpectedly."
}
}
}
response.error.code is constrained to OpenAI's documented Literal values. Platform-specific error codes appear in metadata.x_alien_error_code.
Resume after a network drop
This capability is exclusive to the Responses API.
When the SSE connection drops, reconnect by GETting the response with starting_after set to the last sequence number you successfully processed:
GET /agent/:id/responses/<resp_id>?starting_after=<last_seq>
The server replays all events with sequence_number > last_seq. If the run is still in flight, the GET continues live as new events arrive. If it has already terminated, the GET replays the tail and closes with the original terminal event.
Python — resume example
from openai import OpenAI
client = OpenAI(
base_url="https://api.alien.club/agent/<agent_id>",
api_key="<your-access-token>",
)
response_id = "resp_abc"
last_seq = 4 # last sequence_number successfully processed before the drop
# Resume — the SDK transparently issues GET ?starting_after=4
stream = client.responses.retrieve_streaming(
response_id,
starting_after=last_seq,
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
Storage TTL
Responses are kept in Redis for 24 hours after creation. After expiry, GET returns 410 Gone and resume is not possible. If you need replay beyond 24 hours, store the events client-side as you receive them.
Event types — Responses API
| Event | When |
|---|---|
response.created | First event. Contains the Response object with status: "in_progress" and initial metadata. |
response.in_progress | Optional progress signal during long runs. |
response.output_item.added | A new top-level output item begins (message, function_call, or reasoning). |
response.content_part.added | A new content part begins within a message (output_text, refusal, reasoning_text). |
response.output_text.delta | Text token delta. |
response.output_text.done | Closes an output_text part. |
response.function_call_arguments.delta | Streaming JSON fragment for a function call's arguments. |
response.function_call_arguments.done | Closes a function call item. |
response.reasoning_summary_part.added | A new reasoning summary part begins. |
response.reasoning_summary_text.delta | Text delta inside a reasoning summary. |
response.reasoning_summary_text.done | Closes a reasoning summary part. |
response.content_part.done | Closes a content part. |
response.output_item.done | Closes a top-level output item. |
response.completed | Terminal success. Response.usage populated. |
response.failed | Terminal failure. Response.error populated. |
response.incomplete | Terminal partial result. Response.incomplete_details.reason populated. |
Choosing an API
| Need | Use |
|---|---|
| Broadest client compatibility (LangChain, any OpenAI-compatible proxy) | Chat Completions |
| Resume after network drop | Responses API |
| Typed per-event schema | Responses API |
| Stateless single-turn queries | Either |
| Agentic multi-turn conversations in production | Responses API |
See also
- Configure Your Workflow — wire the httpRequest and httpResponse nodes.
- Chat Completions API — request format and
x_alienextension. - Responses API —
previous_response_idand session management. - Streaming overview (platform docs) — comparison table and quick-start links.