Skip to main content

AI Capabilities

The platform integrates AI at multiple levels: document processing pipelines use OCR and embedding models to make documents searchable, the workflow engine orchestrates LLM calls across multiple providers, and MCP servers give AI agents direct access to your data. This page covers the AI architecture and how these capabilities work together.

Multi-Provider LLM Support

The platform supports multiple LLM providers, configurable per-node in workflows and per-tenant for embeddings. This avoids vendor lock-in and enables cost optimization by routing different tasks to the most appropriate model.

Supported Providers

ProviderCapabilitiesTypical Use Cases
OpenAIChat completion, structured output, embeddingsGeneral-purpose reasoning, JSON output, embeddings
AnthropicChat completion, long-context analysisComplex analysis, research synthesis, long documents
MistralChat completion, OCR, embeddingsEuropean data processing, document extraction, multilingual
Google GeminiChat completion, embeddingsCost-effective batch processing, embeddings

Provider Selection

Provider selection happens at two levels:

  1. Per-node in workflows — each node in the visual workflow editor can be configured with a specific provider and model. A single workflow can use OpenAI for summarization, Anthropic for analysis, and Mistral for translation.

  2. Per-tenant for embeddings — each data cluster tenant has a configured embedding provider. All documents in that tenant's datasets use the same embedding model for consistency in vector search.

info

Embedding provider and vector dimensions are configured when creating a data cluster. Changing the embedding provider after documents have been processed requires re-embedding all existing documents. Choose your embedding provider carefully at cluster creation time.

Embedding Generation

Embeddings are vector representations of text that enable semantic search — finding documents by meaning rather than exact keyword matches.

How Embedding Works in Pipelines

During document processing, the embedding step runs after chunking:

Each chunk produces a fixed-dimensional vector (the dimension depends on the model). These vectors are stored in Qdrant alongside the chunk text and metadata, enabling semantic similarity search.

Embedding Providers

ProviderCharacteristics
OpenAIHigh-quality embeddings, multiple model sizes available
MistralStrong multilingual support
Google GeminiCost-effective, competitive quality

The embedding provider is configured per-tenant, ensuring all vectors in a collection use the same model and dimensions. This is critical for search quality — mixing embedding models in the same collection produces unreliable similarity scores.

OCR via Mistral Document AI

The platform uses Mistral's Document AI service for optical character recognition (OCR). This is the first step in making PDF documents searchable.

What OCR Produces

OutputFormatDescription
Extracted textMarkdownFull document text with heading structure preserved
FiguresImages (base64)All images, diagrams, and figures extracted from the document
OCR metadataJSONPage-by-page extraction details

The OCR output preserves document structure — headings, paragraphs, lists, and tables are represented in Markdown. This structure is important for the chunking step, which uses heading boundaries to create semantically coherent chunks.

Figure Extraction

Figures extracted during OCR are processed through a figure linking step that:

  1. Resolves figure references in the Markdown text (e.g., "Figure 1" links to the actual image)
  2. Converts all images to a consistent format (PNG)
  3. Stores figures alongside the processed document in MinIO
  4. Makes figures accessible via the Data API

The MCP Architecture

The platform implements the Model Context Protocol (MCP) — an open standard for giving AI agents structured access to external data and tools. MCP servers expose your data to AI assistants like Claude, GPT-4, and custom agents through a standardized tool interface.

How MCP Works

The critical property of this architecture is that the AI agent accesses data with the authorizing user's permissions. The agent cannot see data the user cannot see, and every access is logged in the platform's audit trail.

Available MCP Servers

ServerToolsData SourcePurpose
Data Cluster7 toolsYour document collectionsBrowse datasets, keyword search, semantic search, read documents, view figures
OpenAIRE29 toolsOpenAIRE Graph (600M+ products)Literature review, citation analysis, author profiling, research trends
BnF15 toolsBibliotheque nationale de FranceHistorical documents, bibliographic records, digitized collections

MCP Authentication

MCP servers support three authentication paths, all of which enforce the same per-tool RBAC:

  1. OAuth PKCE — the standard path for interactive AI assistants. The user authorizes the agent through a browser-based OAuth flow.
  2. JWT relay — for web applications or services that already hold a valid JWT from the identity provider.
  3. API token — for enterprise integrations using platform API tokens.

In all cases, the MCP server extracts the user's identity and permissions from the token and enforces per-tool access control. Each tool declares what abilities it requires (e.g., dataset:read, entry:read), and the framework checks permissions before executing the tool.

How AI Agents Access Data Securely

The MCP architecture enforces multiple security boundaries:

  1. User-scoped access — the agent inherits the authorizing user's permissions
  2. Per-tool RBAC — each tool declares required abilities, checked before execution
  3. Proxy layer — all data access goes through the platform's authenticated proxy (no direct cluster access)
  4. Audit logging — every tool call is logged with the user identity, MCP session ID, and request details
  5. Organization scoping — agents can only access data within the user's organization

Workflow Orchestration

The visual workflow editor enables building multi-step AI pipelines without writing code. Workflows are defined as directed acyclic graphs (DAGs) where nodes represent operations and edges define data flow.

Node Categories

CategoryExamplesDescription
Data AccessVector search, keyword search, download entryRetrieve data from your clusters
LLMChat completion, structured outputCall language models with configurable providers
Document ProcessingText splitter, summarizer, translatorTransform and analyze text
AudioText-to-speechGenerate audio from text
ResearchOpenAIRE search, citation analysisAccess research intelligence tools
AgentsAgent node, group nodeMulti-agent orchestration and composable sub-workflows
SystemConditional, loop, mergeControl flow and data routing

Building a Workflow

  1. Add nodes to the canvas — each node represents one operation
  2. Connect nodes with edges — defines the data flow between steps
  3. Configure parameters — set model, prompt, search query, etc. per node
  4. Reference upstream outputs — use template syntax to pass data between nodes
  5. Run the workflow — the platform executes the DAG, tracking cost and status per node

Multi-Agent Orchestration

For complex tasks, the platform supports hierarchical multi-agent execution:

  • Agent nodes execute autonomous AI agents with access to tools and sub-agents
  • Group nodes define composable sub-workflows that dissolve into the parent DAG
  • Tool routing — agents can call vector search, keyword search, and other data access tools as part of their reasoning
  • State management — agent conversations and intermediate state are tracked for debugging

This enables building sophisticated AI applications — for example, a research agent that searches your document collection, cross-references findings with the global research graph (via OpenAIRE), and produces a structured analysis.

AI Cost Tracking

The platform tracks AI costs at multiple levels:

LevelWhat Is Tracked
Per-nodeToken counts (input/output), API call costs, model used
Per-jobAggregate cost across all nodes in the workflow
Per-organizationHistorical cost data for billing and budgeting

Cost data is available in the job execution summary, allowing you to understand which steps in your workflows are most expensive and optimize accordingly — for example, by switching to a smaller model for simple classification tasks or batching multiple documents into a single LLM call.

Next Steps