Architecture Overview

The Alien Intelligence platform is split into two architectural layers: the platform (orchestration) and data clusters (data storage and processing). This separation is the foundation of the platform's data isolation guarantee — each tenant's data lives in its own isolated cluster with dedicated databases, storage, and search engines.

Platform and Data Cluster Model

The platform and data cluster model:

By default, both the platform and data clusters are hosted and managed by Alien. For enterprise clients with strict data sovereignty requirements, data clusters can be deployed on your own infrastructure — on-premises or in your cloud account. In on-premise deployments, data clusters initiate outbound-only connections to the platform via encrypted mTLS tunnels; no inbound firewall rules are required on your side.

Platform Topology

Platform Services

Data Cluster Infrastructure

Each data cluster provides isolated, per-tenant infrastructure:

Connectivity

Component Map

Platform

Component	Technology	Purpose
Backend API	AdonisJS (TypeScript)	User auth, dataset catalog, job dispatch, cluster management, billing
Workers	Python (async DAG engine)	AI workflow execution: LLM calls, vector search fan-out
Frontend	Next.js (React)	User dashboard, workflow editor, cluster management UI
Skupper Gateway	FastAPI (Python)	Manage cross-cluster mTLS tunnels
MCP Servers	FastMCP (Python)	AI agent tools for data access, research intelligence, and more
Identity Provider	OIDC Provider	OIDC single sign-on for users and MCP OAuth

Data Cluster

Component	Technology	Purpose
Data API	FastAPI (Python)	REST API for all data operations on customer data
PostgreSQL	CloudNativePG (HA)	Entry metadata, dataset configuration, manifests
MinIO	MinIO Operator	S3-compatible object storage for documents and processed files
Qdrant	Replicated StatefulSet	Vector database for semantic search
Meilisearch	Single instance	Full-text keyword search with typo tolerance
Argo Workflows	Controller + executor	Document processing pipeline orchestration
Operator	Kopf (Python)	Automated tenant provisioning and lifecycle management
Skupper	Site connector	Outbound-only mTLS tunnel to platform

Data Isolation Enforcement

The platform enforces data isolation through multiple independent mechanisms. These apply to all data clusters — whether Alien-hosted or on-premise.

1. Namespace Isolation

Each tenant gets a dedicated Kubernetes namespace with its own:

PostgreSQL database (separate database, scoped credentials)
MinIO storage bucket (bucket-level IAM policies)
Qdrant vector collection (JWT-scoped access)
Meilisearch indexes (API key-scoped)
Data API deployment and network connector

There is no cross-tenant namespace access. Credentials are scoped per tenant, and network policies enforce boundaries.

2. Proxy Architecture

Platform workers and services never connect to data clusters directly. All data access goes through an authenticated proxy endpoint, which forwards requests to the cluster's Data API using per-cluster service credentials. The platform never holds a direct network route to your storage systems.

3. Metadata-Only Sync

Data clusters push only metadata to the platform — dataset names, entry counts, and sync status — via periodic batch sync. The platform's dataset catalog stores pointers, never content. This is what enables cross-cluster discovery without centralizing data.

4. No Data Egress Paths

The Data API has no endpoints that bulk-export data to the platform. Every access is authenticated, scoped to a single entry, and logged. There is no mechanism in the API to stream or export an entire dataset back to the platform.

5. Network Topology (On-Premise)

For on-premise deployments, data clusters initiate outbound-only connections to the platform through Skupper mTLS tunnels. There are no inbound ports, no open firewall rules, and no way for the platform to "reach in" to client infrastructure. The tunnel is encrypted end-to-end and authenticated with mutual TLS certificates.

Technology Stack

Application Layer

Technology	Layer	Role
AdonisJS	Platform	Full-featured TypeScript backend: ORM, authorization, validation
Next.js	Platform	React frontend with server components
FastAPI	Data Cluster	High-performance async Python API with auto-generated OpenAPI
FastMCP	Platform	MCP server framework with OAuth PKCE support
Kopf	Data Cluster	Python Kubernetes operator for CRD-driven automation
Hera SDK	Data Cluster	Python-native Argo Workflow template authoring

Data Layer

Technology	Layer	Role
PostgreSQL	Both	ACID-compliant relational database, JSONB support, HA via CloudNativePG
Qdrant	Data Cluster	Purpose-built vector database: JWT RBAC, replication, payload filtering
MinIO	Data Cluster	S3-compatible erasure-coded object storage, multi-tenant buckets
Meilisearch	Data Cluster	Typo-tolerant, faceted keyword search
Redis	Platform	Session storage and MCP OAuth state

AI and ML

Technology	Layer	Role
OpenAI, Anthropic, Mistral, Google	Platform (Workers)	Multi-provider LLM completions
Mistral OCR	Data Cluster (Pipelines)	PDF text and figure extraction
LangChain + LangGraph	Platform (Workers)	LLM orchestration and multi-agent state machines

Infrastructure

Technology	Layer	Role
Kubernetes	Both	Container orchestration
ArgoCD	Both	GitOps continuous delivery
Istio	Platform	Service mesh with mTLS between services
Skupper	Both	Cross-cluster mTLS tunnels without inbound firewall rules
KEDA	Platform	Event-driven autoscaling (SQS queue depth)
cert-manager	Platform	Automated TLS certificate management
External Secrets Operator	Both	Secret management integration
CloudNativePG	Data Cluster	PostgreSQL high-availability operator
MinIO Operator	Data Cluster	MinIO tenant lifecycle management

Next Steps

Data Sovereignty — Deep dive into data isolation and on-premise deployment
Data Clusters — How tenant isolation works in practice
Pipelines — How documents are processed into searchable knowledge
Deployment Model — Alien Hosted and on-premise deployment details

Platform and Data Cluster Model​

Platform Topology​

Platform Services​

Data Cluster Infrastructure​

Connectivity​

Component Map​

Platform​

Data Cluster​

Data Isolation Enforcement​

1. Namespace Isolation​

2. Proxy Architecture​

3. Metadata-Only Sync​

4. No Data Egress Paths​

5. Network Topology (On-Premise)​

Technology Stack​

Application Layer​

Data Layer​

AI and ML​

Infrastructure​

Next Steps​