Skip to main content

Data Planes

A data plane is the infrastructure layer that hosts one or more data clusters. It represents a Kubernetes cluster with all the shared services needed to run tenant workloads — databases, object storage, vector search, workflow orchestration, and networking.

What Is a Data Plane?

Think of a data plane as a deployment target. When you create a new data cluster for a tenant, you choose which data plane hosts it. The data plane provides the shared infrastructure, and the data cluster provides the per-tenant isolation within that infrastructure.

A single data plane can host many data clusters. Each cluster gets its own namespace, database, storage bucket, vector collection, and search indexes — but they share the underlying infrastructure managed by the data plane.

Data Plane vs Data Cluster

AspectData PlaneData Cluster
ScopeEntire Kubernetes clusterSingle tenant namespace
MultiplicityOne per Kubernetes deploymentMany per data plane
ManagesShared infrastructure (databases, storage, networking)Per-tenant data (documents, embeddings, search indexes)
Operated byInfrastructure team / platform operatorProvisioned automatically by the operator
LifecycleLong-lived, rarely changedCreated and deleted as tenants come and go
ConfigurationInfrastructure sizing, AI providers, chart versionsDataset schemas, pipeline config, access credentials

What a Data Plane Provides

When a data plane is set up, it deploys and manages these shared infrastructure components:

Database — PostgreSQL (CloudNativePG)

A high-availability PostgreSQL cluster shared across all tenants on the data plane. Each tenant gets a separate database within the cluster, with its own credentials and connection pool. CloudNativePG handles replication, failover, and connection management.

Object Storage — MinIO

An S3-compatible object storage deployment with erasure coding for data protection. Each tenant gets a dedicated bucket with IAM-scoped credentials. MinIO handles file storage for uploaded documents, processed content, extracted figures, and pipeline artifacts.

Vector Database — Qdrant

A replicated vector database for semantic search. Each tenant gets a separate collection with JWT-scoped access. Qdrant stores embedding vectors and chunk metadata, enabling similarity search across document collections.

Full-Text Search — Meilisearch

A keyword search engine with typo tolerance and faceted filtering. Each tenant gets separate indexes with API key-scoped access. Meilisearch provides sub-50ms keyword search over document content and metadata.

Pipeline Orchestration — Argo Workflows

A workflow engine that runs document processing pipelines as Kubernetes pods. Pipelines handle OCR, chunking, embedding generation, and content registration. Each pipeline run gets dedicated scratch space and executes within the tenant's namespace.

Cluster Operator

A Kubernetes operator (Kopf-based) that automates tenant provisioning and lifecycle management. When a new data cluster is created, the operator provisions all per-tenant resources automatically. It also handles health monitoring, reconciliation, and deletion.

Network Connectivity — Skupper

Encrypted tunnels connecting the data plane to the Alien Intelligence platform. For on-premise deployments, Skupper provides secure, outbound-only mTLS connectivity — no inbound firewall rules are needed on the data plane side. For Alien-hosted data planes, connectivity is managed internally.

Data Plane Registration

When a new data plane is set up, it goes through a registration process to establish its identity and connectivity with the platform.

Registration Flow

  1. Deploy infrastructure. The data plane's Kubernetes cluster is provisioned with the required Helm charts (infrastructure operators, shared services, the cluster operator).

  2. Operator startup. The cluster operator boots and checks for existing credentials. On first run, it has a one-time registration token.

  3. Register with platform. The operator exchanges the registration token for a permanent service API key. The platform assigns a data plane ID and records the data plane's provider, region, and configuration.

  4. Establish connectivity. For on-premise data planes, the operator redeems a Skupper access grant to establish an mTLS tunnel. For Alien-hosted data planes, connectivity is configured automatically.

  5. Begin heartbeat. Once registered and connected, the operator sends heartbeats every 60 seconds with infrastructure status, tenant list, and chart version information.

tip

The registration token is single-use. Once the operator has exchanged it for a service API key, the token is no longer valid. The service API key is stored as a Kubernetes Secret on the data plane.

Data Plane Lifecycle

States

StateMeaning
PendingData plane CR created, operator not yet registered
RegisteredOperator has registered with platform, heartbeat active

Health Monitoring

The data plane operator continuously monitors the health of shared infrastructure components:

  • Infrastructure status — ArgoCD sync status and rollout state for each component (PostgreSQL, MinIO, Qdrant, Meilisearch, Argo Workflows)
  • Infrastructure usage — Storage consumption, memory usage, and CPU metrics collected every 120 seconds via direct component APIs
  • Tenant status — Current list of all provisioned tenants with their health states
  • Connectivity — Skupper tunnel status and link health

This information is sent to the platform with each heartbeat, giving operators a consolidated view of all data planes from the platform dashboard.

Reconciliation

The operator periodically checks the actual state of all infrastructure against the desired state declared in the DataPlane custom resource. If it detects drift — a missing component, a stale configuration, or a failed deployment — it takes corrective action automatically.

Reconciliation runs every 300 seconds and covers:

  • Re-checking infrastructure endpoints (PostgreSQL, MinIO, Qdrant, Meilisearch)
  • Verifying ArgoCD application sync states
  • Refreshing cached endpoint information
  • Updating the DataPlane status with current infrastructure state

Data Plane Configuration

Data planes are configured through a DataPlane custom resource on the Kubernetes cluster. This resource controls:

Chart Configuration

Pinned Helm chart versions for each infrastructure component. When a chart version is updated in the DataPlane spec, the operator patches the corresponding ArgoCD application to trigger a rolling upgrade.

chartConfiguration:
infrastructureOperators:
version: "1.2.0"
infrastructure:
version: "2.1.0"
dataClusterOperator:
version: "0.8.0"
dataApi:
version: "1.5.0"
note

Data API chart versions can also be pinned per-tenant. When a global version is updated in the DataPlane, existing tenants keep their individually pinned version. New tenants use the data plane default.

Infrastructure Configuration

Resource sizing for shared infrastructure components. Changes to this configuration are propagated to the infrastructure ArgoCD application atomically.

infrastructureConfig:
postgresql:
instances: 3
storage: "100Gi"
minio:
servers: 4
volumes: 4
volumeSize: "100Gi"
qdrant:
replicas: 3
storage: "100Gi"
meilisearch:
storage: "100Gi"

Observability Configuration

Metrics and logging configuration that is propagated to all tenant deployments on the data plane. When observability settings change, the operator updates every tenant's ArgoCD application to apply the new configuration.

Multi-Data-Plane Deployments

Organizations can operate multiple data planes for various reasons:

ScenarioExample
Geographic distributionOne data plane in EU, one in US, for data residency
Environment separationSeparate data planes for production and staging
Provider diversityOne data plane on AWS, one on-premises
Scale isolationA dedicated data plane for a high-volume tenant
Regulatory boundariesSeparate data planes for different compliance regimes

The platform dashboard shows all data planes with their health status, tenant count, and resource utilization. Cross-data-plane operations — like searching across datasets on different data planes — are handled transparently by the platform's multi-cluster fan-out search.

Next Steps