Data API Overview
The Data API is the API that runs on each data cluster. It provides access to datasets, entries, file storage, keyword search, and vector search for the data stored on that cluster. Each data cluster runs its own instance of the Data API, isolated per tenant.
Accessing the Data API
Via the Platform Proxy (Recommended)
For Alien Hosted deployments, the Data API is accessed through the Platform API's cluster proxy. Data clusters are not directly exposed to the internet — all requests pass through the authenticated backend proxy.
Base URL pattern:
https://api.alien.club/clusters/{cluster_id}/proxy
For example, to list datasets on cluster 5:
curl "https://api.alien.club/clusters/5/proxy/api/v1/datasets" \
-H "Authorization: Bearer oat_YOUR_API_TOKEN"
The proxy handles:
- Authentication — Validates your API token or OAuth JWT.
- Authorization — Verifies you belong to the organization that owns the cluster.
- Cluster availability — Returns 503 if the cluster is offline or suspended.
- Audit logging — Logs every request with user, timestamp, and source type.
- Response streaming — Streams data directly from the cluster without caching.
This is the access method used by the Python SDK and TypeScript SDK. Point the SDK's host / basePath to https://api.alien.club/clusters/{cluster_id}/proxy.
Direct Access (On-Premise)
For on-premise deployments where you have network access to the data cluster, you can call the Data API directly using the cluster's internal URL and service API key.
Base URL:
https://your-data-cluster.internal.example.com
Authentication:
curl "https://your-data-cluster.internal.example.com/api/v1/datasets" \
-H "Authorization: Bearer YOUR_SERVICE_API_KEY"
The service API key is generated during cluster registration and stored securely on the cluster. It provides full access to all data on that cluster.
Direct access bypasses the platform's proxy authentication and audit logging. Use it only for on-premise deployments where the cluster is behind your own network security controls.
Authentication
When accessing through the platform proxy, the proxy injects the following headers into the request to the Data API:
| Header | Description |
|---|---|
x-user-id | Authenticated user's ID |
x-organization-id | Active organization ID (tenant scoping) |
x-request-id | Correlation ID for tracing |
x-cluster-id | Target data cluster ID |
You do not need to set these headers yourself when using the proxy — they are added automatically based on your authentication context.
When accessing directly (on-premise), include your service API key as a Bearer token in the Authorization header.
API Versioning
All Data API endpoints are prefixed with /api/v1:
/api/v1/datasets
/api/v1/entries
/api/v1/search
/api/v1/vector/chunks
/api/v1/vector/entries
Key Endpoints
Datasets
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/datasets | List all datasets |
| POST | /api/v1/datasets | Create a new dataset |
| GET | /api/v1/datasets/{id} | Get dataset details |
| PATCH | /api/v1/datasets/{id} | Update a dataset |
| DELETE | /api/v1/datasets/{id} | Delete a dataset |
Entries
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/entries | Create an entry (with file upload) |
| POST | /api/v1/entries/batch | Batch get entries |
| POST | /api/v1/entries/batch/content | Batch get entry content |
| GET | /api/v1/entries/{id} | Get entry details |
| GET | /api/v1/entries/{id}/download | Download an entry file |
Search
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/search | Keyword search (Meilisearch) |
| POST | /api/v1/vector/chunks | Semantic chunk search (Qdrant) |
| POST | /api/v1/vector/entries | Semantic entry search (full documents) |
Health
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/health | Health check (database, storage, search engine status) |
Organization Scoping
Every request to the Data API is scoped to a single organization (tenant). When accessing through the proxy, the organization is determined from your authentication context. When accessing directly, the organization is determined from the x-organization-id header.
Data isolation is enforced at every layer:
- Database queries are filtered by organization ID.
- File storage uses organization-scoped bucket paths.
- Search indexes are isolated per organization.
- Vector collections are separate per organization.
There is no way to access another organization's data through the Data API, regardless of authentication method.
Request and Response Conventions
Content Type
All request bodies use JSON:
Content-Type: application/json
File uploads use multipart form data:
Content-Type: multipart/form-data
Batch Operations
The Data API is optimized for batch access. Prefer batch endpoints over individual requests when working with multiple entries:
# Batch get 50 entries (single request, ~250ms)
curl -X POST "https://api.alien.club/clusters/5/proxy/api/v1/entries/batch" \
-H "Authorization: Bearer oat_YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"dataset_id": 1, "limit": 50}'
Error Responses
Errors return a JSON object:
{
"detail": "Entry with id 999 not found"
}
| Status Code | Meaning |
|---|---|
| 200 | Success |
| 201 | Created |
| 400 | Bad request |
| 404 | Not found |
| 422 | Validation error |
| 500 | Internal server error |
Endpoint Reference
The full endpoint reference is auto-generated from the Data API's OpenAPI specification:
- Data API Endpoints — Browse all endpoints with request/response schemas, status codes, and examples.
Next Steps
- Platform API Overview — The central platform API and proxy
- SDK Overview — Type-safe client libraries
- Python SDK Quickstart — Python examples
- TypeScript SDK Quickstart — TypeScript examples