Skip to main content

Architecture Overview

The Alien Intelligence platform is split into two architectural layers: the platform (orchestration) and data clusters (data storage and processing). This separation is the foundation of the platform's data isolation guarantee — each tenant's data lives in its own isolated cluster with dedicated databases, storage, and search engines.

Platform and Data Cluster Model

The platform and data cluster model:

By default, both the platform and data clusters are hosted and managed by Alien. For enterprise clients with strict data sovereignty requirements, data clusters can be deployed on your own infrastructure — on-premises or in your cloud account. In on-premise deployments, data clusters initiate outbound-only connections to the platform via encrypted mTLS tunnels; no inbound firewall rules are required on your side.

Platform Topology

Platform Services

Data Cluster Infrastructure

Each data cluster provides isolated, per-tenant infrastructure:

Connectivity

Component Map

Platform

ComponentTechnologyPurpose
Backend APIAdonisJS (TypeScript)User auth, dataset catalog, job dispatch, cluster management, billing
WorkersPython (async DAG engine)AI workflow execution: LLM calls, vector search fan-out
FrontendNext.js (React)User dashboard, workflow editor, cluster management UI
Skupper GatewayFastAPI (Python)Manage cross-cluster mTLS tunnels
MCP ServersFastMCP (Python)AI agent tools for data access, research intelligence, and more
Identity ProviderOIDC ProviderOIDC single sign-on for users and MCP OAuth

Data Cluster

ComponentTechnologyPurpose
Data APIFastAPI (Python)REST API for all data operations on customer data
PostgreSQLCloudNativePG (HA)Entry metadata, dataset configuration, manifests
MinIOMinIO OperatorS3-compatible object storage for documents and processed files
QdrantReplicated StatefulSetVector database for semantic search
MeilisearchSingle instanceFull-text keyword search with typo tolerance
Argo WorkflowsController + executorDocument processing pipeline orchestration
OperatorKopf (Python)Automated tenant provisioning and lifecycle management
SkupperSite connectorOutbound-only mTLS tunnel to platform

Data Isolation Enforcement

The platform enforces data isolation through multiple independent mechanisms. These apply to all data clusters — whether Alien-hosted or on-premise.

1. Namespace Isolation

Each tenant gets a dedicated Kubernetes namespace with its own:

  • PostgreSQL database (separate database, scoped credentials)
  • MinIO storage bucket (bucket-level IAM policies)
  • Qdrant vector collection (JWT-scoped access)
  • Meilisearch indexes (API key-scoped)
  • Data API deployment and network connector

There is no cross-tenant namespace access. Credentials are scoped per tenant, and network policies enforce boundaries.

2. Proxy Architecture

Platform workers and services never connect to data clusters directly. All data access goes through an authenticated proxy endpoint, which forwards requests to the cluster's Data API using per-cluster service credentials. The platform never holds a direct network route to your storage systems.

3. Metadata-Only Sync

Data clusters push only metadata to the platform — dataset names, entry counts, and sync status — via periodic batch sync. The platform's dataset catalog stores pointers, never content. This is what enables cross-cluster discovery without centralizing data.

4. No Data Egress Paths

The Data API has no endpoints that bulk-export data to the platform. Every access is authenticated, scoped to a single entry, and logged. There is no mechanism in the API to stream or export an entire dataset back to the platform.

5. Network Topology (On-Premise)

For on-premise deployments, data clusters initiate outbound-only connections to the platform through Skupper mTLS tunnels. There are no inbound ports, no open firewall rules, and no way for the platform to "reach in" to client infrastructure. The tunnel is encrypted end-to-end and authenticated with mutual TLS certificates.

Technology Stack

Application Layer

TechnologyLayerRole
AdonisJSPlatformFull-featured TypeScript backend: ORM, authorization, validation
Next.jsPlatformReact frontend with server components
FastAPIData ClusterHigh-performance async Python API with auto-generated OpenAPI
FastMCPPlatformMCP server framework with OAuth PKCE support
KopfData ClusterPython Kubernetes operator for CRD-driven automation
Hera SDKData ClusterPython-native Argo Workflow template authoring

Data Layer

TechnologyLayerRole
PostgreSQLBothACID-compliant relational database, JSONB support, HA via CloudNativePG
QdrantData ClusterPurpose-built vector database: JWT RBAC, replication, payload filtering
MinIOData ClusterS3-compatible erasure-coded object storage, multi-tenant buckets
MeilisearchData ClusterTypo-tolerant, faceted keyword search
RedisPlatformSession storage and MCP OAuth state

AI and ML

TechnologyLayerRole
OpenAI, Anthropic, Mistral, GooglePlatform (Workers)Multi-provider LLM completions
Mistral OCRData Cluster (Pipelines)PDF text and figure extraction
LangChain + LangGraphPlatform (Workers)LLM orchestration and multi-agent state machines

Infrastructure

TechnologyLayerRole
KubernetesBothContainer orchestration
ArgoCDBothGitOps continuous delivery
IstioPlatformService mesh with mTLS between services
SkupperBothCross-cluster mTLS tunnels without inbound firewall rules
KEDAPlatformEvent-driven autoscaling (SQS queue depth)
cert-managerPlatformAutomated TLS certificate management
External Secrets OperatorBothSecret management integration
CloudNativePGData ClusterPostgreSQL high-availability operator
MinIO OperatorData ClusterMinIO tenant lifecycle management

Next Steps

  • Data Sovereignty — Deep dive into data isolation and on-premise deployment
  • Data Clusters — How tenant isolation works in practice
  • Pipelines — How documents are processed into searchable knowledge
  • Deployment Model — Alien Hosted and on-premise deployment details