Skip to main content

Responses API

POST /agent/:id/responses is the preferred way to drive an agentic workflow. It builds on OpenAI's modern Responses surface, which is purpose-designed for multi-turn agents: session state lives on the server, multi-turn continuity is a first-class field (previous_response_id), and every in-flight run is resumable after a network drop.

The platform also exposes:

  • GET /agent/:id/responses/:response_id — retrieve or stream a completed/in-flight response from a sequence number.
  • POST /agent/:id/responses/:response_id/cancel — cancel an in-flight run.

All three endpoints are drop-in compatible with the official OpenAI SDKs (Python ≥ 1.50, Node ≥ 4.50, AI SDK 5+).

How the endpoint maps to the workflow

  1. Resolve input to a prompt string: if a plain string, use it directly; if an array of input items, walk in reverse and use the content of the last user-role item.
  2. Resolve previous_response_id to a session_id by querying the server-side response store (Redis). On first turn, generate a fresh session_id.
  3. Generate a response_id (format: resp_<base64url-uuid>).
  4. Inject { user_prompt, session_id } as the httpRequest node's input body.
  5. Write the response metadata to the store so the next turn can chain via previous_response_id.

The workflow receives exactly the same user_prompt and session_id fields as the Chat Completions path. The difference is that here the session linkage is managed entirely by the platform — callers never need to maintain a conversation_id themselves.

Request body

FieldRequiredNotes
modelyesInjected into the workflow as httpRequest-0.model. Wire it through agentInputdeepAgent to let callers select the upstream LLM per request.
inputyesPlain string or array of OpenAI input items.
streamnotrue for SSE streaming, false for a blocking response.
previous_response_idnoLinks this turn to a prior run. The platform resolves it to the matching session_id.
instructionsnoDeveloper/system-level instructions. Forwarded to the runtime where supported.
metadatanoKey/value map (≤ 16 keys, 64-char keys, 512-char values).
tools, tool_choicenoCurrently advisory.
temperature, top_p, max_output_tokens, parallel_tool_callsnoForwarded to the upstream model where supported.
background: truerejectedReturns 422. Use stream: true with the GET resume endpoint instead.
conversationrejectedReturns 422. Use previous_response_id for multi-turn linkage.

Multi-turn continuity

On each turn the response carries an id field. Pass that id as previous_response_id on the next turn — the platform resolves it to the correct session_id and the agent runtime picks up where it left off.

Turn 1:  POST /agent/:id/responses  { input: "Hello" }
→ response.id = "resp_abc"

Turn 2: POST /agent/:id/responses { input: "Who are you?", previous_response_id: "resp_abc" }
→ response.id = "resp_def"

Turn 3: POST /agent/:id/responses { input: "Thanks", previous_response_id: "resp_def" }

You do not need to resend message history. The agent's session_id-keyed memory handles continuity. If you lose the previous_response_id (e.g. the client crashes), start a new session — the old session remains valid in Redis for 24 hours and can be linked from another client if you stored the response_id.

Quick start

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
base_url="https://api.alien.club/agent/<agent_id>",
api_key="<your-access-token>",
)

# First turn
with client.responses.stream(
model="agent",
input="What is the SYNTEC collective agreement?",
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
first_response = stream.get_final_response()

print()

# Second turn — chain via previous_response_id
with client.responses.stream(
model="agent",
input="What is the trial period for an executive?",
previous_response_id=first_response.id,
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)

TypeScript (openai SDK)

import OpenAI from "openai"

const client = new OpenAI({
baseURL: "https://api.alien.club/agent/<agent_id>",
apiKey: "<your-access-token>",
})

// First turn
const firstStream = await client.responses.create({
model: "agent",
input: "What is the SYNTEC collective agreement?",
stream: true,
})

let previousResponseId: string | undefined

for await (const event of firstStream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta)
}
if (event.type === "response.completed") {
previousResponseId = event.response.id
}
}

// Second turn
const secondStream = await client.responses.create({
model: "agent",
input: "What is the trial period for an executive?",
previous_response_id: previousResponseId,
stream: true,
})

for await (const event of secondStream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta)
}
}

TypeScript (Vercel AI SDK)

import { createOpenAI } from "@ai-sdk/openai"
import { streamText } from "ai"

const provider = createOpenAI({
baseURL: "https://api.alien.club/agent/<agent_id>",
apiKey: "<your-access-token>",
})

// First turn
const { fullStream, experimental_providerMetadata } = await streamText({
model: provider.responses("agent"),
prompt: "What is the SYNTEC collective agreement?",
})

let previousResponseId: string | undefined

for await (const part of fullStream) {
if (part.type === "text-delta") process.stdout.write(part.textDelta)
if (part.type === "finish") {
previousResponseId = (await experimental_providerMetadata)?.openai?.responseId as string
}
}

// Second turn
const { fullStream: secondStream } = await streamText({
model: provider.responses("agent"),
prompt: "What is the trial period for an executive?",
providerOptions: {
openai: { previousResponseId },
},
})

for await (const part of secondStream) {
if (part.type === "text-delta") process.stdout.write(part.textDelta)
}

cURL

# First turn
curl -N \
-H "Authorization: Bearer <your-access-token>" \
-H "Content-Type: application/json" \
-d '{"model":"agent","input":"Hello","stream":true}' \
https://api.alien.club/agent/<agent_id>/responses

# Second turn — set previous_response_id to the id from the response.completed event
curl -N \
-H "Authorization: Bearer <your-access-token>" \
-H "Content-Type: application/json" \
-d '{"model":"agent","input":"Who are you?","previous_response_id":"resp_abc","stream":true}' \
https://api.alien.club/agent/<agent_id>/responses

Non-streaming mode

Omit stream for a blocking JSON response in the standard Response shape. The platform waits for the full agent run. As with Chat Completions, avoid this for long-running agentic workflows.

response = client.responses.create(
model="agent",
input="Summarise this document in one sentence.",
previous_response_id=previous_response_id,
)
print(response.output_text)

Response object gotchas

Timestamp units are inconsistent

In the Response object, created_at is a Unix timestamp in seconds but completed_at is in milliseconds. Do not subtract them directly to compute duration:

# Wrong — produces a nonsensical result
duration = response.completed_at - response.created_at

# Correct
duration_seconds = (response.completed_at / 1000) - response.created_at

x_alien_agent_registry is a JSON-encoded string

metadata.x_alien_agent_registry is a JSON string nested inside the metadata map — it requires a second JSON.parse / json.loads call:

import json
registry = json.loads(response.metadata["x_alien_agent_registry"])
# registry is now a list of agent identity dicts
const registry = JSON.parse(response.metadata["x_alien_agent_registry"] as string)

Retrieve a response

GET /agent/:id/responses/<resp_id>

Returns the full Response object in its current state. If the run has already completed, the response includes the final output. If the run is still in flight and you pass ?starting_after=<seq>, the GET streams remaining events as SSE (see Streaming — resume).

Resume after a network drop

Every streaming event carries a sequence_number. If the connection drops, reconnect by GETting the response:

GET /agent/:id/responses/<resp_id>?starting_after=<last_sequence_number>

The server replays all events with a higher sequence number. If the run is still in flight the GET continues live. If it has already completed the replay ends with the original terminal event. See Streaming Responses for full semantics.

Cancel a run

POST /agent/:id/responses/<resp_id>/cancel

Returns a standard envelope with cancelled in the data object:

{ "success": true, "data": { "cancelled": true } }

cancelled: true if the run was in-flight; cancelled: false if it had already completed or failed. Cancellation is best-effort — events already enqueued may still arrive before the worker acknowledges the cancel signal.

Session storage and TTL

Response metadata is stored in Redis for 24 hours after creation. After expiry:

  • GET /agent/:id/responses/<resp_id> returns 410 Gone.
  • previous_response_id referencing an expired response generates a new session instead of returning an error — the chain is broken but the new turn succeeds.

Responses API vs Chat Completions

CapabilityResponses APIChat Completions
Multi-turn sessionprevious_response_id (server-managed)conversation_id + full messages history (client-managed)
Resume after network dropYes — GET ?starting_after=<seq>No
Typed event streamYes — response.* discriminated unionNo — ChatCompletionChunk
Standard OpenAI consumerPython ≥ 1.50, Node ≥ 4.50, AI SDK 5+All OpenAI-compatible clients
Recommended forAgentic workflows, production UIsQuick integrations, broad tooling reach

See also