Responses API
The Responses API is OpenAI's modern streaming surface. It uses a richer, strongly-typed event taxonomy and supports native resume — the killer feature for production UIs that need to recover from network drops without losing in-flight runs.
The Alien platform exposes two endpoints:
POST /agent/:id/responses— start a new response, optionally streaming.GET /agent/:id/responses/:respId?starting_after=<seq>— resume an existing response from a sequence number.
Both are drop-in compatible with client.responses.create(...) from the official OpenAI SDKs (Python ≥ 1.50, Node ≥ 4.50, AI SDK 5+).
Quick start
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.alien.club/agent/<agent_id>",
api_key="<your-access-token>",
)
with client.responses.stream(
model="agent",
input="What is the duration of the trial period for a SYNTEC executive?",
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
TypeScript (openai SDK)
import OpenAI from "openai"
const client = new OpenAI({
baseURL: "https://api.alien.club/agent/<agent_id>",
apiKey: "<your-access-token>",
})
const stream = await client.responses.create({
model: "agent",
input: "What is the duration of the trial period for a SYNTEC executive?",
stream: true,
})
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta)
}
}
cURL — initial request
curl -N \
-H "Authorization: Bearer <your-access-token>" \
-H "Content-Type: application/json" \
-d '{"model":"agent","input":"Hello","stream":true}' \
https://api.alien.club/agent/<agent_id>/responses
cURL — resume after a network drop
# Read the last sequence_number you saw, then:
curl -N \
-H "Authorization: Bearer <your-access-token>" \
https://api.alien.club/agent/<agent_id>/responses/<resp_id>?starting_after=<last_seq>
Request body
The request body matches OpenAI's Responses create request:
| Field | Required | Notes |
|---|---|---|
model | yes | Free-form string. The platform routes to the configured upstream model for this agent. |
input | yes | Either a free-form string (treated as a user message) or an array of input items per OpenAI's input taxonomy. |
instructions | no | System or developer-style instructions. |
stream | no, defaults false | Set to true for streaming. |
metadata | no | Free-form key/value map (≤16 keys, 64-char keys, 512-char values). Forwarded to Response.metadata. |
tools, tool_choice | no | Currently advisory. |
temperature, top_p, max_output_tokens, parallel_tool_calls, previous_response_id, conversation | no | Accepted, forwarded to the runtime where supported. |
background: true | rejected | Background responses are out of scope for v1. Use stream: true and resume via GET instead. |
Event format
Each event is a Server-Sent Events frame carrying both the SSE event: line (the event type discriminator) and a data: line (the JSON payload):
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{...}}
event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":1,"output_index":0,"item":{...}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":2,"item_id":"...","output_index":0,"content_index":0,"delta":"Hello","logprobs":[]}
event: response.completed
data: {"type":"response.completed","sequence_number":N,"response":{...}}
Every event validates against the OpenAI Python SDK's openai.types.responses.ResponseStreamEvent discriminated union.
There is no [DONE] terminator. The stream closes after exactly one terminal event:
response.completed— successful run.response.failed— failure. CarriesResponse.errorwith code and message.response.incomplete— truncated bymax_output_tokensor content filter.
Event types
The complete event subset emitted by this endpoint:
| Event type | When |
|---|---|
response.created | First event. Carries the Response object with status: "in_progress". |
response.in_progress | Optional progress signal during long runs. |
response.output_item.added | A new top-level output item appears (message, function_call, or reasoning item). |
response.content_part.added | Within a message, a new content part starts (output_text, refusal, or reasoning_text). |
response.output_text.delta | Text token delta within an output_text part. |
response.output_text.done | Closes an output_text part. |
response.function_call_arguments.delta | Streaming JSON-string fragments of a function call's arguments. |
response.function_call_arguments.done | Closes a function-call item. |
response.reasoning_summary_part.added | A new reasoning summary part appears. |
response.reasoning_summary_text.delta | Text delta inside a reasoning summary. |
response.reasoning_summary_text.done | Closes a reasoning summary part. |
response.content_part.done | Closes a content part. |
response.output_item.done | Closes a top-level output item. |
response.completed | Terminal success. Response.usage populated. |
response.failed | Terminal failure. Response.error populated. |
response.incomplete | Terminal partial result. Response.incomplete_details.reason populated. |
The full schema for each event is documented at https://platform.openai.com/docs/api-reference/responses-streaming.
Sequence numbers and resume
Every event carries a monotonically increasing sequence_number, starting at 0. This is the resume cursor.
When a streaming connection drops mid-run, reconnect by GETting the response id with starting_after set to the last sequence number you successfully processed:
GET /agent/:id/responses/<resp_id>?starting_after=<last_seq>
The server replays all events with sequence_number > <last_seq>. If the run is still in flight, the GET continues live as new events arrive. If the run has already terminated, the GET replays the tail and closes with the original terminal event.
Storage and TTL
Streamed responses are persisted server-side in Redis for 24 hours after creation. After expiry:
GET /agent/:id/responses/<resp_id>returns HTTP 410 Gone.- The response cannot be resumed and must be re-issued via
POST.
24 hours covers any realistic network-recovery window. If you need durable replay beyond a day, store the events client-side as you receive them.
Failure modes for resume
| Status | Reason |
|---|---|
| 200 | Normal — replay or live tail begins. |
| 400 | starting_after is invalid (not a non-negative integer, or beyond the response's last sequence number). |
| 404 | Response unknown to this agent — wrong id or wrong agent. |
| 410 | Response existed but its TTL expired. |
Subagent context via metadata.x_alien_*
Multi-agent runs (where the main agent dispatches subagents via tools) carry the agent registry in the Response.metadata field. The Responses API permits arbitrary metadata per response — this is a documented extension point, not a standards violation.
Example Response.metadata carrying subagent context:
{
"x_alien_root_agent_id": "MAIN",
"x_alien_agent_registry": "[{\"id\":\"MAIN\",\"kind\":\"main\",\"name\":\"main\",\"parent_id\":null},{\"id\":\"sub-legifrance\",\"kind\":\"subagent\",\"name\":\"Légifrance researcher\",\"parent_id\":\"MAIN\"}]"
}
Per-item agent identity is encoded in the item's id using a structured prefix: agent:<agent_id>::msg_<random> for messages, agent:<agent_id>::fc_<random> for function calls. Standard consumers treat the prefix as opaque (which is how the SDK treats every item id); extension-aware consumers parse the prefix to render per-subagent affordances.
The 512-character cap on a single metadata value means very large registries (≥50 agents) may be truncated. If truncation occurs, metadata.x_alien_registry_truncated is set to "true" and consumers should fall back to parsing per-item id prefixes.
Errors
Pre-stream errors
Failures before the first event return HTTP 4xx/5xx with a JSON body matching OpenAI's standard error envelope:
{ "error": { "message": "...", "type": "invalid_request_error", "code": "..." } }
Mid-stream errors
Failures after the first event are surfaced via the response.failed terminal event:
{
"type": "response.failed",
"sequence_number": <int>,
"response": {
"id": "resp_...",
"status": "failed",
"error": { "code": "server_error", "message": "..." },
"metadata": {
"x_alien_error_code": "upstream_timeout",
"x_alien_error_message": "upstream model timed out after 30s"
},
...
}
}
Response.error.code is constrained to OpenAI's documented Literal codes (server_error, rate_limit_exceeded, invalid_prompt, etc.) — Alien-specific codes are surfaced via metadata.x_alien_error_code.
Heartbeat
The server emits SSE comment lines (:keep-alive\n\n) every 15 seconds during quiet periods. Comments do not parse as events and do not advance sequence_number.
Compatibility checklist
The streams produced by these endpoints are validated against:
- The OpenAI Python SDK's
openai.types.responses.ResponseStreamEventdiscriminated union (ResponseCreatedEvent,ResponseOutputItemAddedEvent,ResponseTextDeltaEvent, etc.). - Reference fixtures derived from OpenAI's Responses streaming reference and the OpenAI Python SDK source.
If a standard OpenAI Responses consumer breaks against this endpoint, that's a bug — please report it.
See also
- Streaming overview — when to use Chat Completions vs Responses API.
- Chat Completions API — broader tooling reach, no resume.
- OpenAI reference: https://platform.openai.com/docs/api-reference/responses-streaming.