Skip to main content

Responses API

The Responses API is OpenAI's modern streaming surface. It uses a richer, strongly-typed event taxonomy and supports native resume — the killer feature for production UIs that need to recover from network drops without losing in-flight runs.

The Alien platform exposes two endpoints:

  • POST /agent/:id/responses — start a new response, optionally streaming.
  • GET /agent/:id/responses/:respId?starting_after=<seq> — resume an existing response from a sequence number.

Both are drop-in compatible with client.responses.create(...) from the official OpenAI SDKs (Python ≥ 1.50, Node ≥ 4.50, AI SDK 5+).

Quick start

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
base_url="https://api.alien.club/agent/<agent_id>",
api_key="<your-access-token>",
)

with client.responses.stream(
model="agent",
input="What is the duration of the trial period for a SYNTEC executive?",
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)

TypeScript (openai SDK)

import OpenAI from "openai"

const client = new OpenAI({
baseURL: "https://api.alien.club/agent/<agent_id>",
apiKey: "<your-access-token>",
})

const stream = await client.responses.create({
model: "agent",
input: "What is the duration of the trial period for a SYNTEC executive?",
stream: true,
})

for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta)
}
}

cURL — initial request

curl -N \
-H "Authorization: Bearer <your-access-token>" \
-H "Content-Type: application/json" \
-d '{"model":"agent","input":"Hello","stream":true}' \
https://api.alien.club/agent/<agent_id>/responses

cURL — resume after a network drop

# Read the last sequence_number you saw, then:
curl -N \
-H "Authorization: Bearer <your-access-token>" \
https://api.alien.club/agent/<agent_id>/responses/<resp_id>?starting_after=<last_seq>

Request body

The request body matches OpenAI's Responses create request:

FieldRequiredNotes
modelyesFree-form string. The platform routes to the configured upstream model for this agent.
inputyesEither a free-form string (treated as a user message) or an array of input items per OpenAI's input taxonomy.
instructionsnoSystem or developer-style instructions.
streamno, defaults falseSet to true for streaming.
metadatanoFree-form key/value map (≤16 keys, 64-char keys, 512-char values). Forwarded to Response.metadata.
tools, tool_choicenoCurrently advisory.
temperature, top_p, max_output_tokens, parallel_tool_calls, previous_response_id, conversationnoAccepted, forwarded to the runtime where supported.
background: truerejectedBackground responses are out of scope for v1. Use stream: true and resume via GET instead.

Event format

Each event is a Server-Sent Events frame carrying both the SSE event: line (the event type discriminator) and a data: line (the JSON payload):

event: response.created
data: {"type":"response.created","sequence_number":0,"response":{...}}

event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":1,"output_index":0,"item":{...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":2,"item_id":"...","output_index":0,"content_index":0,"delta":"Hello","logprobs":[]}

event: response.completed
data: {"type":"response.completed","sequence_number":N,"response":{...}}

Every event validates against the OpenAI Python SDK's openai.types.responses.ResponseStreamEvent discriminated union.

There is no [DONE] terminator. The stream closes after exactly one terminal event:

  • response.completed — successful run.
  • response.failed — failure. Carries Response.error with code and message.
  • response.incomplete — truncated by max_output_tokens or content filter.

Event types

The complete event subset emitted by this endpoint:

Event typeWhen
response.createdFirst event. Carries the Response object with status: "in_progress".
response.in_progressOptional progress signal during long runs.
response.output_item.addedA new top-level output item appears (message, function_call, or reasoning item).
response.content_part.addedWithin a message, a new content part starts (output_text, refusal, or reasoning_text).
response.output_text.deltaText token delta within an output_text part.
response.output_text.doneCloses an output_text part.
response.function_call_arguments.deltaStreaming JSON-string fragments of a function call's arguments.
response.function_call_arguments.doneCloses a function-call item.
response.reasoning_summary_part.addedA new reasoning summary part appears.
response.reasoning_summary_text.deltaText delta inside a reasoning summary.
response.reasoning_summary_text.doneCloses a reasoning summary part.
response.content_part.doneCloses a content part.
response.output_item.doneCloses a top-level output item.
response.completedTerminal success. Response.usage populated.
response.failedTerminal failure. Response.error populated.
response.incompleteTerminal partial result. Response.incomplete_details.reason populated.

The full schema for each event is documented at https://platform.openai.com/docs/api-reference/responses-streaming.

Sequence numbers and resume

Every event carries a monotonically increasing sequence_number, starting at 0. This is the resume cursor.

When a streaming connection drops mid-run, reconnect by GETting the response id with starting_after set to the last sequence number you successfully processed:

GET /agent/:id/responses/<resp_id>?starting_after=<last_seq>

The server replays all events with sequence_number > <last_seq>. If the run is still in flight, the GET continues live as new events arrive. If the run has already terminated, the GET replays the tail and closes with the original terminal event.

Storage and TTL

Streamed responses are persisted server-side in Redis for 24 hours after creation. After expiry:

  • GET /agent/:id/responses/<resp_id> returns HTTP 410 Gone.
  • The response cannot be resumed and must be re-issued via POST.

24 hours covers any realistic network-recovery window. If you need durable replay beyond a day, store the events client-side as you receive them.

Failure modes for resume

StatusReason
200Normal — replay or live tail begins.
400starting_after is invalid (not a non-negative integer, or beyond the response's last sequence number).
404Response unknown to this agent — wrong id or wrong agent.
410Response existed but its TTL expired.

Subagent context via metadata.x_alien_*

Multi-agent runs (where the main agent dispatches subagents via tools) carry the agent registry in the Response.metadata field. The Responses API permits arbitrary metadata per response — this is a documented extension point, not a standards violation.

Example Response.metadata carrying subagent context:

{
"x_alien_root_agent_id": "MAIN",
"x_alien_agent_registry": "[{\"id\":\"MAIN\",\"kind\":\"main\",\"name\":\"main\",\"parent_id\":null},{\"id\":\"sub-legifrance\",\"kind\":\"subagent\",\"name\":\"Légifrance researcher\",\"parent_id\":\"MAIN\"}]"
}

Per-item agent identity is encoded in the item's id using a structured prefix: agent:<agent_id>::msg_<random> for messages, agent:<agent_id>::fc_<random> for function calls. Standard consumers treat the prefix as opaque (which is how the SDK treats every item id); extension-aware consumers parse the prefix to render per-subagent affordances.

The 512-character cap on a single metadata value means very large registries (≥50 agents) may be truncated. If truncation occurs, metadata.x_alien_registry_truncated is set to "true" and consumers should fall back to parsing per-item id prefixes.

Errors

Pre-stream errors

Failures before the first event return HTTP 4xx/5xx with a JSON body matching OpenAI's standard error envelope:

{ "error": { "message": "...", "type": "invalid_request_error", "code": "..." } }

Mid-stream errors

Failures after the first event are surfaced via the response.failed terminal event:

{
"type": "response.failed",
"sequence_number": <int>,
"response": {
"id": "resp_...",
"status": "failed",
"error": { "code": "server_error", "message": "..." },
"metadata": {
"x_alien_error_code": "upstream_timeout",
"x_alien_error_message": "upstream model timed out after 30s"
},
...
}
}

Response.error.code is constrained to OpenAI's documented Literal codes (server_error, rate_limit_exceeded, invalid_prompt, etc.) — Alien-specific codes are surfaced via metadata.x_alien_error_code.

Heartbeat

The server emits SSE comment lines (:keep-alive\n\n) every 15 seconds during quiet periods. Comments do not parse as events and do not advance sequence_number.

Compatibility checklist

The streams produced by these endpoints are validated against:

  • The OpenAI Python SDK's openai.types.responses.ResponseStreamEvent discriminated union (ResponseCreatedEvent, ResponseOutputItemAddedEvent, ResponseTextDeltaEvent, etc.).
  • Reference fixtures derived from OpenAI's Responses streaming reference and the OpenAI Python SDK source.

If a standard OpenAI Responses consumer breaks against this endpoint, that's a bug — please report it.

See also