Data API Overview

The Data API is the API that runs on each data cluster. It provides access to datasets, entries, file storage, keyword search, and vector search for the data stored on that cluster. Each data cluster runs its own instance of the Data API, isolated per tenant.

Accessing the Data API

Via the Platform Proxy (Recommended)

For Alien Hosted deployments, the Data API is accessed through the Platform API's cluster proxy. Data clusters are not directly exposed to the internet — all requests pass through the authenticated backend proxy.

Base URL pattern:

https://api.alien.club/clusters/{cluster_id}/proxy

For example, to list datasets on cluster 5:

curl "https://api.alien.club/clusters/5/proxy/api/v1/datasets" \
  -H "Authorization: Bearer oat_YOUR_API_TOKEN"

The proxy handles:

Authentication — Validates your API token or OAuth JWT.
Authorization — Verifies you belong to the organization that owns the cluster.
Cluster availability — Returns 503 if the cluster is offline or suspended.
Audit logging — Logs every request with user, timestamp, and source type.
Response streaming — Streams data directly from the cluster without caching.

tip

This is the access method used by the Python SDK and TypeScript SDK. Point the SDK's host / basePath to https://api.alien.club/clusters/{cluster_id}/proxy.

Direct Access (On-Premise)

For on-premise deployments where you have network access to the data cluster, you can call the Data API directly using the cluster's internal URL and service API key.

Base URL:

https://your-data-cluster.internal.example.com

Authentication:

curl "https://your-data-cluster.internal.example.com/api/v1/datasets" \
  -H "Authorization: Bearer YOUR_SERVICE_API_KEY"

The service API key is generated during cluster registration and stored securely on the cluster. It provides full access to all data on that cluster.

caution

Direct access bypasses the platform's proxy authentication and audit logging. Use it only for on-premise deployments where the cluster is behind your own network security controls.

Authentication

When accessing through the platform proxy, the proxy injects the following headers into the request to the Data API:

Header	Description
`x-user-id`	Authenticated user's ID
`x-organization-id`	Active organization ID (tenant scoping)
`x-request-id`	Correlation ID for tracing
`x-cluster-id`	Target data cluster ID

You do not need to set these headers yourself when using the proxy — they are added automatically based on your authentication context.

When accessing directly (on-premise), include your service API key as a Bearer token in the Authorization header.

API Versioning

All Data API endpoints are prefixed with /api/v1:

/api/v1/datasets
/api/v1/entries
/api/v1/search
/api/v1/vector/chunks
/api/v1/vector/entries

Key Endpoints

Datasets

Method	Endpoint	Description
GET	`/api/v1/datasets`	List all datasets
POST	`/api/v1/datasets`	Create a new dataset
GET	`/api/v1/datasets/{id}`	Get dataset details
PATCH	`/api/v1/datasets/{id}`	Update a dataset
DELETE	`/api/v1/datasets/{id}`	Delete a dataset

Entries

Method	Endpoint	Description
POST	`/api/v1/entries`	Create an entry (with file upload)
POST	`/api/v1/entries/batch`	Batch get entries
POST	`/api/v1/entries/batch/content`	Batch get entry content
GET	`/api/v1/entries/{id}`	Get entry details
GET	`/api/v1/entries/{id}/download`	Download an entry file

Search

Method	Endpoint	Description
POST	`/api/v1/search`	Keyword search (Meilisearch)
POST	`/api/v1/vector/chunks`	Semantic chunk search (Qdrant)
POST	`/api/v1/vector/entries`	Semantic entry search (full documents)

Health

Method	Endpoint	Description
GET	`/api/v1/health`	Health check (database, storage, search engine status)

Organization Scoping

Every request to the Data API is scoped to a single organization (tenant). When accessing through the proxy, the organization is determined from your authentication context. When accessing directly, the organization is determined from the x-organization-id header.

Data isolation is enforced at every layer:

Database queries are filtered by organization ID.
File storage uses organization-scoped bucket paths.
Search indexes are isolated per organization.
Vector collections are separate per organization.

There is no way to access another organization's data through the Data API, regardless of authentication method.

Request and Response Conventions

Content Type

All request bodies use JSON:

Content-Type: application/json

File uploads use multipart form data:

Content-Type: multipart/form-data

Batch Operations

The Data API is optimized for batch access. Prefer batch endpoints over individual requests when working with multiple entries:

# Batch get 50 entries (single request, ~250ms)
curl -X POST "https://api.alien.club/clusters/5/proxy/api/v1/entries/batch" \
  -H "Authorization: Bearer oat_YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"dataset_id": 1, "limit": 50}'

Error Responses

Errors return a JSON object:

{
  "detail": "Entry with id 999 not found"
}

Status Code	Meaning
200	Success
201	Created
400	Bad request
404	Not found
422	Validation error
500	Internal server error

Endpoint Reference

The full endpoint reference is auto-generated from the Data API's OpenAPI specification:

Data API Endpoints — Browse all endpoints with request/response schemas, status codes, and examples.

Next Steps

Platform API Overview — The central platform API and proxy
SDK Overview — Type-safe client libraries
Python SDK Quickstart — Python examples
TypeScript SDK Quickstart — TypeScript examples

Accessing the Data API​

Via the Platform Proxy (Recommended)​

Direct Access (On-Premise)​

Authentication​

API Versioning​

Key Endpoints​

Datasets​

Entries​

Search​

Health​

Organization Scoping​

Request and Response Conventions​

Content Type​

Batch Operations​

Error Responses​

Endpoint Reference​

Next Steps​