Streaming Responses

Both the Chat Completions and Responses API endpoints support stream: true. Both use Server-Sent Events (SSE) and conform to their respective OpenAI streaming specs. On top of that baseline, the platform injects additional fields that expose multi-agent context, session identifiers, and platform errors in a way that degrades safely for consumers that don't know about them.

Enabling streaming

Set stream: true in the request body for either endpoint:

{ "model": "agent", "input": "Hello", "stream": true }

The response will have Content-Type: text/event-stream and Transfer-Encoding: chunked. The connection stays open until the agent run terminates or the client disconnects.

Keep-alive

When no event has been emitted for 15 seconds, the platform sends an SSE comment line to prevent the connection from timing out:

:keep-alive

SSE comments (lines starting with :) are ignored by every OpenAI SDK and EventSource implementation. They do not advance sequence numbers.

Chat Completions streaming

The stream is a sequence of data: frames, each containing a JSON-encoded ChatCompletionChunk, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"agent","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Every chunk validates against openai.types.chat.ChatCompletionChunk.

`x_alien` extension — Chat Completions

Each chunk may carry an x_alien top-level field. Consumers that don't know about it ignore it safely — the OpenAI SDK models accept unknown top-level keys.

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "choices": [{ "index": 0, "delta": { "content": "Hello" }, "finish_reason": null }],
  "x_alien": {
    "conversation_id": "550e8400-e29b-41d4-a716-446655440000",
    "agent_id": "MAIN",
    "agent_register": {
      "id": "MAIN",
      "kind": "main",
      "name": "main",
      "parent_id": null
    },
    "kind": "text"
  }
}

x_alien fields:

Field	When present	Description
`conversation_id`	First chunk only	The `session_id` for this turn. Capture this and pass it as `conversation_id` on the next turn to maintain the session.
`agent_id`	Always	ID of the agent that produced this chunk (`"MAIN"` for the root agent, node id for subagents).
`agent_register`	First chunk per agent	Full identity of this agent: `id`, `kind` (`"main"` / `"subagent"` / `"tool"`), `name`, `parent_id`, and (for subagents) `dispatched_by_tool_call_id`. Emitted once per agent per run.
`kind`	Always	`"text"` or `"reasoning"`. Reasoning chunks contain chain-of-thought traces; use this to render them separately or hide them.
`lifecycle`	At agent boundaries	`"agent_end"` (root or subagent finished) or `"subagent_dispatched"` (root dispatched a subtask).
`error`	On failure only	`{ code: string, message: string }` on the closing chunk of a failed run.

Mid-stream errors — Chat Completions

If the agent fails after chunks have been emitted, the platform closes with a synthetic final chunk where finish_reason: "stop" and x_alien.error carries the error details, followed by [DONE]. Standard consumers see a clean stop. Extension-aware consumers detect the failure via x_alien.error.

The platform never emits a non-standard finish_reason value — doing so would fail SDK validation against the closed Literal type.

Responses API streaming

The Responses API uses a two-line SSE frame per event: an event: line naming the type, and a data: line with the JSON payload.

note

The actual text tokens arrive in response.output_text.delta events, not in response.created. A typical run emits 8 distinct event types before the terminal response.completed. Consumers that only listen for response.created will receive metadata but no text content.

A typical run emits this event sequence:

event: response.created          ← metadata only, output[] is empty
event: response.output_item.added
event: response.content_part.added
event: response.output_text.delta  ← text token (one per chunk)
event: response.output_text.delta  ← ...
event: response.output_text.done
event: response.content_part.done
event: response.output_item.done
event: response.completed          ← full response object, usage populated

Each frame:

event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_...","status":"in_progress",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":3,"item_id":"...","output_index":0,"content_index":0,"delta":"Hello","logprobs":[]}

event: response.completed
data: {"type":"response.completed","sequence_number":12,"response":{"id":"resp_...","status":"completed",...}}

There is no [DONE] terminator. The stream closes after exactly one terminal event: response.completed, response.failed, or response.incomplete.

Every event validates against openai.types.responses.ResponseStreamEvent.

Sequence numbers

Every event carries a monotonically increasing sequence_number starting at 0. This is the resume cursor. Track the last sequence_number you successfully processed; you need it to reconnect after a network drop.

`metadata.x_alien_*` extension — Responses API

Platform context is carried in Response.metadata on the response.created and response.completed events. The OpenAI Responses spec documents metadata as an open key/value map — this is an explicit extension point.

{
  "type": "response.created",
  "sequence_number": 0,
  "response": {
    "id": "resp_...",
    "status": "in_progress",
    "metadata": {
      "x_alien_root_agent_id": "MAIN",
      "x_alien_agent_registry": "[{\"id\":\"MAIN\",\"kind\":\"main\",\"name\":\"main\",\"parent_id\":null},{\"id\":\"subagent-6\",\"kind\":\"subagent\",\"name\":\"Légifrance researcher\",\"parent_id\":\"MAIN\"}]"
    }
  }
}

metadata keys injected by the platform:

Key	Description
`x_alien_root_agent_id`	Node id of the root agent in this run (typically `"MAIN"`).
`x_alien_agent_registry`	JSON-encoded array of agent identity objects for every agent in the run. Each entry: `{ id, kind, name, parent_id }`.
`x_alien_registry_truncated`	`"true"` if the registry was truncated due to the 512-character metadata value limit. Fall back to parsing per-item id prefixes in this case.
`x_alien_error_code`	Set on `response.failed` when the underlying error code is platform-specific and cannot be expressed via OpenAI's closed `Response.error.code` Literal.
`x_alien_error_message`	Human-readable companion to `x_alien_error_code`.

Per-item agent identity

In multi-agent runs, each output item's id field carries a structured prefix that identifies the agent that produced it:

agent:<agent_id>::msg_<random>      ← message item
agent:<agent_id>::fc_<random>       ← function call item

Standard consumers treat the id as opaque. Extension-aware consumers parse the prefix to attribute individual output items to specific subagents and render per-agent affordances (collapsed subagent panels, per-agent citations, etc.).

Mid-stream errors — Responses API

If the agent fails after events have been emitted, the platform synthesises a response.failed terminal event:

{
  "type": "response.failed",
  "sequence_number": 8,
  "response": {
    "id": "resp_...",
    "status": "failed",
    "error": { "code": "server_error", "message": "Worker disconnected" },
    "metadata": {
      "x_alien_error_code": "worker_disconnected",
      "x_alien_error_message": "The worker processing this job disconnected unexpectedly."
    }
  }
}

response.error.code is constrained to OpenAI's documented Literal values. Platform-specific error codes appear in metadata.x_alien_error_code.

Resume after a network drop

This capability is exclusive to the Responses API.

When the SSE connection drops, reconnect by GETting the response with starting_after set to the last sequence number you successfully processed:

GET /agent/:id/responses/<resp_id>?starting_after=<last_seq>

The server replays all events with sequence_number > last_seq. If the run is still in flight, the GET continues live as new events arrive. If it has already terminated, the GET replays the tail and closes with the original terminal event.

Python — resume example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.alien.club/agent/<agent_id>",
    api_key="<your-access-token>",
)

response_id = "resp_abc"
last_seq = 4  # last sequence_number successfully processed before the drop

# Resume — the SDK transparently issues GET ?starting_after=4
stream = client.responses.retrieve_streaming(
    response_id,
    starting_after=last_seq,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Storage TTL

Responses are kept in Redis for 24 hours after creation. After expiry, GET returns 410 Gone and resume is not possible. If you need replay beyond 24 hours, store the events client-side as you receive them.

Event types — Responses API

Event	When
`response.created`	First event. Contains the `Response` object with `status: "in_progress"` and initial metadata.
`response.in_progress`	Optional progress signal during long runs.
`response.output_item.added`	A new top-level output item begins (message, function_call, or reasoning).
`response.content_part.added`	A new content part begins within a message (`output_text`, `refusal`, `reasoning_text`).
`response.output_text.delta`	Text token delta.
`response.output_text.done`	Closes an `output_text` part.
`response.function_call_arguments.delta`	Streaming JSON fragment for a function call's arguments.
`response.function_call_arguments.done`	Closes a function call item.
`response.reasoning_summary_part.added`	A new reasoning summary part begins.
`response.reasoning_summary_text.delta`	Text delta inside a reasoning summary.
`response.reasoning_summary_text.done`	Closes a reasoning summary part.
`response.content_part.done`	Closes a content part.
`response.output_item.done`	Closes a top-level output item.
`response.completed`	Terminal success. `Response.usage` populated.
`response.failed`	Terminal failure. `Response.error` populated.
`response.incomplete`	Terminal partial result. `Response.incomplete_details.reason` populated.

Choosing an API

Need	Use
Broadest client compatibility (LangChain, any OpenAI-compatible proxy)	Chat Completions
Resume after network drop	Responses API
Typed per-event schema	Responses API
Stateless single-turn queries	Either
Agentic multi-turn conversations in production	Responses API

Enabling streaming​

Keep-alive​

Chat Completions streaming​

x_alien extension — Chat Completions​

Mid-stream errors — Chat Completions​

Responses API streaming​

Sequence numbers​

metadata.x_alien_* extension — Responses API​

Per-item agent identity​

Mid-stream errors — Responses API​

Resume after a network drop​

Python — resume example​

Storage TTL​

Event types — Responses API​

Choosing an API​

See also​