Architecture Overview
The Alien Intelligence platform is split into two architectural layers: the platform (orchestration) and data clusters (data storage and processing). This separation is the foundation of the platform's data isolation guarantee — each tenant's data lives in its own isolated cluster with dedicated databases, storage, and search engines.
Platform and Data Cluster Model
The platform and data cluster model:
By default, both the platform and data clusters are hosted and managed by Alien. For enterprise clients with strict data sovereignty requirements, data clusters can be deployed on your own infrastructure — on-premises or in your cloud account. In on-premise deployments, data clusters initiate outbound-only connections to the platform via encrypted mTLS tunnels; no inbound firewall rules are required on your side.
Platform Topology
Platform Services
Data Cluster Infrastructure
Each data cluster provides isolated, per-tenant infrastructure:
Connectivity
Component Map
Platform
| Component | Technology | Purpose |
|---|---|---|
| Backend API | AdonisJS (TypeScript) | User auth, dataset catalog, job dispatch, cluster management, billing |
| Workers | Python (async DAG engine) | AI workflow execution: LLM calls, vector search fan-out |
| Frontend | Next.js (React) | User dashboard, workflow editor, cluster management UI |
| Skupper Gateway | FastAPI (Python) | Manage cross-cluster mTLS tunnels |
| MCP Servers | FastMCP (Python) | AI agent tools for data access, research intelligence, and more |
| Identity Provider | OIDC Provider | OIDC single sign-on for users and MCP OAuth |
Data Cluster
| Component | Technology | Purpose |
|---|---|---|
| Data API | FastAPI (Python) | REST API for all data operations on customer data |
| PostgreSQL | CloudNativePG (HA) | Entry metadata, dataset configuration, manifests |
| MinIO | MinIO Operator | S3-compatible object storage for documents and processed files |
| Qdrant | Replicated StatefulSet | Vector database for semantic search |
| Meilisearch | Single instance | Full-text keyword search with typo tolerance |
| Argo Workflows | Controller + executor | Document processing pipeline orchestration |
| Operator | Kopf (Python) | Automated tenant provisioning and lifecycle management |
| Skupper | Site connector | Outbound-only mTLS tunnel to platform |
Data Isolation Enforcement
The platform enforces data isolation through multiple independent mechanisms. These apply to all data clusters — whether Alien-hosted or on-premise.
1. Namespace Isolation
Each tenant gets a dedicated Kubernetes namespace with its own:
- PostgreSQL database (separate database, scoped credentials)
- MinIO storage bucket (bucket-level IAM policies)
- Qdrant vector collection (JWT-scoped access)
- Meilisearch indexes (API key-scoped)
- Data API deployment and network connector
There is no cross-tenant namespace access. Credentials are scoped per tenant, and network policies enforce boundaries.
2. Proxy Architecture
Platform workers and services never connect to data clusters directly. All data access goes through an authenticated proxy endpoint, which forwards requests to the cluster's Data API using per-cluster service credentials. The platform never holds a direct network route to your storage systems.
3. Metadata-Only Sync
Data clusters push only metadata to the platform — dataset names, entry counts, and sync status — via periodic batch sync. The platform's dataset catalog stores pointers, never content. This is what enables cross-cluster discovery without centralizing data.
4. No Data Egress Paths
The Data API has no endpoints that bulk-export data to the platform. Every access is authenticated, scoped to a single entry, and logged. There is no mechanism in the API to stream or export an entire dataset back to the platform.
5. Network Topology (On-Premise)
For on-premise deployments, data clusters initiate outbound-only connections to the platform through Skupper mTLS tunnels. There are no inbound ports, no open firewall rules, and no way for the platform to "reach in" to client infrastructure. The tunnel is encrypted end-to-end and authenticated with mutual TLS certificates.
Technology Stack
Application Layer
| Technology | Layer | Role |
|---|---|---|
| AdonisJS | Platform | Full-featured TypeScript backend: ORM, authorization, validation |
| Next.js | Platform | React frontend with server components |
| FastAPI | Data Cluster | High-performance async Python API with auto-generated OpenAPI |
| FastMCP | Platform | MCP server framework with OAuth PKCE support |
| Kopf | Data Cluster | Python Kubernetes operator for CRD-driven automation |
| Hera SDK | Data Cluster | Python-native Argo Workflow template authoring |
Data Layer
| Technology | Layer | Role |
|---|---|---|
| PostgreSQL | Both | ACID-compliant relational database, JSONB support, HA via CloudNativePG |
| Qdrant | Data Cluster | Purpose-built vector database: JWT RBAC, replication, payload filtering |
| MinIO | Data Cluster | S3-compatible erasure-coded object storage, multi-tenant buckets |
| Meilisearch | Data Cluster | Typo-tolerant, faceted keyword search |
| Redis | Platform | Session storage and MCP OAuth state |
AI and ML
| Technology | Layer | Role |
|---|---|---|
| OpenAI, Anthropic, Mistral, Google | Platform (Workers) | Multi-provider LLM completions |
| Mistral OCR | Data Cluster (Pipelines) | PDF text and figure extraction |
| LangChain + LangGraph | Platform (Workers) | LLM orchestration and multi-agent state machines |
Infrastructure
| Technology | Layer | Role |
|---|---|---|
| Kubernetes | Both | Container orchestration |
| ArgoCD | Both | GitOps continuous delivery |
| Istio | Platform | Service mesh with mTLS between services |
| Skupper | Both | Cross-cluster mTLS tunnels without inbound firewall rules |
| KEDA | Platform | Event-driven autoscaling (SQS queue depth) |
| cert-manager | Platform | Automated TLS certificate management |
| External Secrets Operator | Both | Secret management integration |
| CloudNativePG | Data Cluster | PostgreSQL high-availability operator |
| MinIO Operator | Data Cluster | MinIO tenant lifecycle management |
Next Steps
- Data Sovereignty — Deep dive into data isolation and on-premise deployment
- Data Clusters — How tenant isolation works in practice
- Pipelines — How documents are processed into searchable knowledge
- Deployment Model — Alien Hosted and on-premise deployment details