What is Alien Intelligence?
Alien Intelligence is a data platform for organizations that need to manage large document collections with AI-powered processing, search, and agent integration — with strong per-tenant data isolation.
The platform separates orchestration (the platform layer) from data storage and processing (isolated data clusters). Each tenant gets dedicated databases, storage, vector indexes, and search engines — fully isolated from other tenants. By default, Alien hosts and manages everything for you. For enterprises with strict data sovereignty requirements (GDPR, HIPAA, regulated industries), data clusters can optionally be deployed on your own infrastructure.
Key Capabilities
Data Management
Create, organize, and version large document collections from a single control plane. Datasets support typed schemas, lifecycle tracking, and manifest-based storage that separates metadata from files. The platform catalog stores only metadata pointers — content stays in your isolated data cluster.
Intelligent Processing
Raw documents are automatically transformed into searchable, AI-queryable knowledge bases through composable pipeline components. The platform ships with pre-built pipeline presets for common formats (PDF, scientific XML, DOCX) and supports custom pipelines by composing existing components. Processing runs as parallelized workflows with automatic retry and scratch space per job.
Search and Discovery
Two complementary search engines run on every data cluster:
- Keyword search with typo tolerance and faceted filtering, returning results in under 50ms
- Vector similarity search across embedded document chunks for semantic discovery
Multi-cluster fan-out lets a single query span datasets stored on different clusters simultaneously, with results merged and ranked by relevance.
AI Agent Integration
AI assistants (Claude, GPT-4, custom agents) can directly search, read, and analyze your document collections through the Model Context Protocol (MCP). The platform provides tools covering the complete data access workflow, from dataset discovery to reading individual figures from processed documents. Authentication follows end-user permissions — agents never receive more access than the human who authorized them.
Research Intelligence
Built-in access to public research databases including over 600 million research products via OpenAIRE and 14 million bibliographic records from France's national library (BnF). AI agents can conduct literature reviews, profile authors, track funding outputs, and navigate historical document collections without additional subscriptions.
Data Isolation by Design
Every tenant on Alien Intelligence gets strong data isolation, regardless of where the cluster is hosted:
- Namespace isolation: Each tenant gets dedicated databases, storage buckets, vector collections, and search indexes with scoped credentials.
- Metadata-only sync: Only dataset names, entry counts, and sync status flow to the platform catalog. Never content.
- Proxy architecture: The platform never holds a direct route to your storage. All data access is authenticated and per-entry.
- No bulk egress: There are no API endpoints that export data back to the platform.
On-Premise Deployment (Enterprise)
For organizations in regulated industries — healthcare, defense, financial services, government research — data clusters can be deployed on your own infrastructure. This adds full data sovereignty with:
- Network topology: On-premise clusters initiate outbound-only connections via encrypted mTLS tunnels. No inbound firewall rules required.
- Physical data residency: Documents, embeddings, and indexes physically reside on infrastructure you control.
On-premise deployment is recommended only for teams with the capacity to manage Kubernetes infrastructure. Alien-hosted clusters are maintained by Alien with better response times and automatic updates.
Both deployment modes are structurally compatible with GDPR, HIPAA, and ISO 27001 requirements.
Who Uses Alien Intelligence?
- Research institutions building searchable corpora of scientific literature with semantic search and AI-powered analysis
- Content providers ingesting proprietary document formats into structured, AI-queryable collections
- Cultural heritage organizations providing AI-native access to large historical document archives
- Regulated enterprises that need vector search, LLM analysis, and multi-agent workflows with strong data isolation — optionally on their own infrastructure
Where to Go Next
Core Concepts
Understand the platform architecture, data clusters, datasets, pipelines, and search.
How-To Guides
Step-by-step guides for creating clusters, uploading documents, and configuring pipelines.
API Reference
Interactive API documentation for the Platform API and Data API with request/response schemas.
SDK Reference
Python and TypeScript client libraries for programmatic access to the Data API.
Architecture
Deep dives into deployment models, networking, infrastructure, processing engine, and compliance.