Skip to main content

Data API Overview

The Data API is the API that runs on each data cluster. It provides access to datasets, entries, file storage, keyword search, and vector search for the data stored on that cluster. Each data cluster runs its own instance of the Data API, isolated per tenant.

Accessing the Data API

For Alien Hosted deployments, the Data API is accessed through the Platform API's cluster proxy. Data clusters are not directly exposed to the internet — all requests pass through the authenticated backend proxy.

Base URL pattern:

https://api.alien.club/clusters/{cluster_id}/proxy

For example, to list datasets on cluster 5:

curl "https://api.alien.club/clusters/5/proxy/api/v1/datasets" \
-H "Authorization: Bearer oat_YOUR_API_TOKEN"

The proxy handles:

  • Authentication — Validates your API token or OAuth JWT.
  • Authorization — Verifies you belong to the organization that owns the cluster.
  • Cluster availability — Returns 503 if the cluster is offline or suspended.
  • Audit logging — Logs every request with user, timestamp, and source type.
  • Response streaming — Streams data directly from the cluster without caching.
tip

This is the access method used by the Python SDK and TypeScript SDK. Point the SDK's host / basePath to https://api.alien.club/clusters/{cluster_id}/proxy.

Direct Access (On-Premise)

For on-premise deployments where you have network access to the data cluster, you can call the Data API directly using the cluster's internal URL and service API key.

Base URL:

https://your-data-cluster.internal.example.com

Authentication:

curl "https://your-data-cluster.internal.example.com/api/v1/datasets" \
-H "Authorization: Bearer YOUR_SERVICE_API_KEY"

The service API key is generated during cluster registration and stored securely on the cluster. It provides full access to all data on that cluster.

caution

Direct access bypasses the platform's proxy authentication and audit logging. Use it only for on-premise deployments where the cluster is behind your own network security controls.

Authentication

When accessing through the platform proxy, the proxy injects the following headers into the request to the Data API:

HeaderDescription
x-user-idAuthenticated user's ID
x-organization-idActive organization ID (tenant scoping)
x-request-idCorrelation ID for tracing
x-cluster-idTarget data cluster ID

You do not need to set these headers yourself when using the proxy — they are added automatically based on your authentication context.

When accessing directly (on-premise), include your service API key as a Bearer token in the Authorization header.

API Versioning

All Data API endpoints are prefixed with /api/v1:

/api/v1/datasets
/api/v1/entries
/api/v1/search
/api/v1/vector/chunks
/api/v1/vector/entries

Key Endpoints

Datasets

MethodEndpointDescription
GET/api/v1/datasetsList all datasets
POST/api/v1/datasetsCreate a new dataset
GET/api/v1/datasets/{id}Get dataset details
PATCH/api/v1/datasets/{id}Update a dataset
DELETE/api/v1/datasets/{id}Delete a dataset

Entries

MethodEndpointDescription
POST/api/v1/entriesCreate an entry (with file upload)
POST/api/v1/entries/batchBatch get entries
POST/api/v1/entries/batch/contentBatch get entry content
GET/api/v1/entries/{id}Get entry details
GET/api/v1/entries/{id}/downloadDownload an entry file
MethodEndpointDescription
POST/api/v1/searchKeyword search (Meilisearch)
POST/api/v1/vector/chunksSemantic chunk search (Qdrant)
POST/api/v1/vector/entriesSemantic entry search (full documents)

Health

MethodEndpointDescription
GET/api/v1/healthHealth check (database, storage, search engine status)

Organization Scoping

Every request to the Data API is scoped to a single organization (tenant). When accessing through the proxy, the organization is determined from your authentication context. When accessing directly, the organization is determined from the x-organization-id header.

Data isolation is enforced at every layer:

  • Database queries are filtered by organization ID.
  • File storage uses organization-scoped bucket paths.
  • Search indexes are isolated per organization.
  • Vector collections are separate per organization.

There is no way to access another organization's data through the Data API, regardless of authentication method.

Request and Response Conventions

Content Type

All request bodies use JSON:

Content-Type: application/json

File uploads use multipart form data:

Content-Type: multipart/form-data

Batch Operations

The Data API is optimized for batch access. Prefer batch endpoints over individual requests when working with multiple entries:

# Batch get 50 entries (single request, ~250ms)
curl -X POST "https://api.alien.club/clusters/5/proxy/api/v1/entries/batch" \
-H "Authorization: Bearer oat_YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dataset_id": 1, "limit": 50}'

Error Responses

Errors return a JSON object:

{
"detail": "Entry with id 999 not found"
}
Status CodeMeaning
200Success
201Created
400Bad request
404Not found
422Validation error
500Internal server error

Endpoint Reference

The full endpoint reference is auto-generated from the Data API's OpenAPI specification:

  • Data API Endpoints — Browse all endpoints with request/response schemas, status codes, and examples.

Next Steps