Data Planes
A data plane is the infrastructure layer that hosts one or more data clusters. It represents a Kubernetes cluster with all the shared services needed to run tenant workloads — databases, object storage, vector search, workflow orchestration, and networking.
What Is a Data Plane?
Think of a data plane as a deployment target. When you create a new data cluster for a tenant, you choose which data plane hosts it. The data plane provides the shared infrastructure, and the data cluster provides the per-tenant isolation within that infrastructure.
A single data plane can host many data clusters. Each cluster gets its own namespace, database, storage bucket, vector collection, and search indexes — but they share the underlying infrastructure managed by the data plane.
Data Plane vs Data Cluster
| Aspect | Data Plane | Data Cluster |
|---|---|---|
| Scope | Entire Kubernetes cluster | Single tenant namespace |
| Multiplicity | One per Kubernetes deployment | Many per data plane |
| Manages | Shared infrastructure (databases, storage, networking) | Per-tenant data (documents, embeddings, search indexes) |
| Operated by | Infrastructure team / platform operator | Provisioned automatically by the operator |
| Lifecycle | Long-lived, rarely changed | Created and deleted as tenants come and go |
| Configuration | Infrastructure sizing, AI providers, chart versions | Dataset schemas, pipeline config, access credentials |
What a Data Plane Provides
When a data plane is set up, it deploys and manages these shared infrastructure components:
Database — PostgreSQL (CloudNativePG)
A high-availability PostgreSQL cluster shared across all tenants on the data plane. Each tenant gets a separate database within the cluster, with its own credentials and connection pool. CloudNativePG handles replication, failover, and connection management.
Object Storage — MinIO
An S3-compatible object storage deployment with erasure coding for data protection. Each tenant gets a dedicated bucket with IAM-scoped credentials. MinIO handles file storage for uploaded documents, processed content, extracted figures, and pipeline artifacts.
Vector Database — Qdrant
A replicated vector database for semantic search. Each tenant gets a separate collection with JWT-scoped access. Qdrant stores embedding vectors and chunk metadata, enabling similarity search across document collections.
Full-Text Search — Meilisearch
A keyword search engine with typo tolerance and faceted filtering. Each tenant gets separate indexes with API key-scoped access. Meilisearch provides sub-50ms keyword search over document content and metadata.
Pipeline Orchestration — Argo Workflows
A workflow engine that runs document processing pipelines as Kubernetes pods. Pipelines handle OCR, chunking, embedding generation, and content registration. Each pipeline run gets dedicated scratch space and executes within the tenant's namespace.
Cluster Operator
A Kubernetes operator (Kopf-based) that automates tenant provisioning and lifecycle management. When a new data cluster is created, the operator provisions all per-tenant resources automatically. It also handles health monitoring, reconciliation, and deletion.
Network Connectivity — Skupper
Encrypted tunnels connecting the data plane to the Alien Intelligence platform. For on-premise deployments, Skupper provides secure, outbound-only mTLS connectivity — no inbound firewall rules are needed on the data plane side. For Alien-hosted data planes, connectivity is managed internally.
Data Plane Registration
When a new data plane is set up, it goes through a registration process to establish its identity and connectivity with the platform.
Registration Flow
-
Deploy infrastructure. The data plane's Kubernetes cluster is provisioned with the required Helm charts (infrastructure operators, shared services, the cluster operator).
-
Operator startup. The cluster operator boots and checks for existing credentials. On first run, it has a one-time registration token.
-
Register with platform. The operator exchanges the registration token for a permanent service API key. The platform assigns a data plane ID and records the data plane's provider, region, and configuration.
-
Establish connectivity. For on-premise data planes, the operator redeems a Skupper access grant to establish an mTLS tunnel. For Alien-hosted data planes, connectivity is configured automatically.
-
Begin heartbeat. Once registered and connected, the operator sends heartbeats every 60 seconds with infrastructure status, tenant list, and chart version information.
The registration token is single-use. Once the operator has exchanged it for a service API key, the token is no longer valid. The service API key is stored as a Kubernetes Secret on the data plane.
Data Plane Lifecycle
States
| State | Meaning |
|---|---|
| Pending | Data plane CR created, operator not yet registered |
| Registered | Operator has registered with platform, heartbeat active |
Health Monitoring
The data plane operator continuously monitors the health of shared infrastructure components:
- Infrastructure status — ArgoCD sync status and rollout state for each component (PostgreSQL, MinIO, Qdrant, Meilisearch, Argo Workflows)
- Infrastructure usage — Storage consumption, memory usage, and CPU metrics collected every 120 seconds via direct component APIs
- Tenant status — Current list of all provisioned tenants with their health states
- Connectivity — Skupper tunnel status and link health
This information is sent to the platform with each heartbeat, giving operators a consolidated view of all data planes from the platform dashboard.
Reconciliation
The operator periodically checks the actual state of all infrastructure against the desired state declared in the DataPlane custom resource. If it detects drift — a missing component, a stale configuration, or a failed deployment — it takes corrective action automatically.
Reconciliation runs every 300 seconds and covers:
- Re-checking infrastructure endpoints (PostgreSQL, MinIO, Qdrant, Meilisearch)
- Verifying ArgoCD application sync states
- Refreshing cached endpoint information
- Updating the DataPlane status with current infrastructure state
Data Plane Configuration
Data planes are configured through a DataPlane custom resource on the Kubernetes cluster. This resource controls:
Chart Configuration
Pinned Helm chart versions for each infrastructure component. When a chart version is updated in the DataPlane spec, the operator patches the corresponding ArgoCD application to trigger a rolling upgrade.
chartConfiguration:
infrastructureOperators:
version: "1.2.0"
infrastructure:
version: "2.1.0"
dataClusterOperator:
version: "0.8.0"
dataApi:
version: "1.5.0"
Data API chart versions can also be pinned per-tenant. When a global version is updated in the DataPlane, existing tenants keep their individually pinned version. New tenants use the data plane default.
Infrastructure Configuration
Resource sizing for shared infrastructure components. Changes to this configuration are propagated to the infrastructure ArgoCD application atomically.
infrastructureConfig:
postgresql:
instances: 3
storage: "100Gi"
minio:
servers: 4
volumes: 4
volumeSize: "100Gi"
qdrant:
replicas: 3
storage: "100Gi"
meilisearch:
storage: "100Gi"
Observability Configuration
Metrics and logging configuration that is propagated to all tenant deployments on the data plane. When observability settings change, the operator updates every tenant's ArgoCD application to apply the new configuration.
Multi-Data-Plane Deployments
Organizations can operate multiple data planes for various reasons:
| Scenario | Example |
|---|---|
| Geographic distribution | One data plane in EU, one in US, for data residency |
| Environment separation | Separate data planes for production and staging |
| Provider diversity | One data plane on AWS, one on-premises |
| Scale isolation | A dedicated data plane for a high-volume tenant |
| Regulatory boundaries | Separate data planes for different compliance regimes |
The platform dashboard shows all data planes with their health status, tenant count, and resource utilization. Cross-data-plane operations — like searching across datasets on different data planes — are handled transparently by the platform's multi-cluster fan-out search.
Next Steps
- Data Clusters — How per-tenant isolation works within a data plane
- Architecture Overview — The full platform architecture
- Data Sovereignty — How the platform enforces data isolation
- Create a Data Plane — Step-by-step guide to deploying a data plane