Skip to main content

Processing Engine

The platform has two distinct execution engines, each designed for a different type of workload:

  • Argo Workflows — runs document processing pipelines (OCR, chunking, embedding) on data clusters, close to the data
  • Workers — execute AI workflows (LLM calls, vector search, multi-agent orchestration) on the platform, with access to multiple providers

These engines complement each other: Argo Workflows turn raw documents into searchable knowledge bases, and Workers use that knowledge to power AI-driven analysis and automation.

Two Execution Engines

AspectArgo WorkflowsWorkers
LocationData cluster (close to data)Platform (close to AI providers)
Triggered byFile upload or manual triggerUser runs workflow from UI
Use caseDocument ingestion and processingAI analysis and automation
ParallelismKubernetes pods per pipeline stepThread pool with async execution
Data accessDirect access to storage servicesVia platform proxy to data clusters
OutputStored in data cluster (MinIO, Qdrant, Meilisearch)Returned as workflow result
RetryPer-step retry with backoffPer-node retry + queue-based redelivery

Argo Workflows: Document Processing

Argo Workflows is a Kubernetes-native workflow engine. Each pipeline runs as a series of pods in the tenant's namespace, with each step producing artifacts that are passed to the next step.

How a Pipeline Runs

When a document is uploaded, the Data API checks if the dataset has an auto-trigger pipeline configured. If it does, the pipeline is submitted to Argo Workflows:

Each step runs as a container with defined resource limits, retry policies, and artifact I/O. Steps execute in dependency order — the chunker waits for OCR to complete, the embedder waits for the chunker, and so on.

Pipeline Components

Pipelines are composed of reusable components. Each component is a pre-deployed WorkflowTemplate on the data cluster, versioned and discoverable via the Data API.

Acquisition — get the document into the pipeline:

ComponentInputOutputDescription
Data API Entry AcquisitionEntry IDFile artifactDownloads the original file from the Data API
S3 AcquisitionS3 URIFile artifactDownloads directly from S3-compatible storage

Processing — transform the document:

ComponentInputOutputDescription
Mistral OCRPDF fileMarkdown + figures + OCR JSONExtracts text and images using Mistral Document AI
Figure LinkerMarkdown + figuresResolved markdown + PNG figuresResolves figure references, converts images to PNG
Image ProcessorImage filesOptimized imagesBatch image optimization
Metadata FilterEntry metadataFiltered metadataSelects fields relevant for vector storage

Chunking and Embedding — prepare for search:

ComponentInputOutputDescription
Markdown ChunkerMarkdown textChunk array (JSON)Semantic splitting with heading awareness and token counting
Embedding GeneratorChunk arrayEmbedding vectorsGenerates vector embeddings via configurable provider

Registration — store the results:

ComponentInputOutputDescription
Chunks RegistrationChunks + embeddingsQdrant pointsUpserts vectors with metadata into tenant's Qdrant collection
Processed Content RegistrationMarkdown + figuresMinIO objectsStores processed text and figures in object storage
Processed Files RegistrationFile artifactsMinIO objectsUploads additional processed files
Entry Status RegistrationStatus updateDB updateSets entry status to "processed" (or "error" on failure)

Pipeline Presets

The platform includes pre-built pipeline configurations for common document types:

PresetStepsUse Case
PDF OCRAcquire, OCR, Figure Link, Chunk, Embed, RegisterPDFs and scanned documents
JATS XMLAcquire, Extract MECA, Parse JATS, Figure Link, Chunk, Embed, RegisterScientific articles in JATS XML format

You can apply a preset to any dataset, and the pipeline will automatically trigger when documents are uploaded. Custom pipeline configurations can combine any available components in a DAG structure.

Composability

Pipeline components are designed to be composed. Adding support for a new document format requires writing only the format-specific parser — the chunking, embedding, and registration components are reused. This means:

  • New parsers integrate into existing pipelines by adding a single step
  • Multiple pipeline configurations can share the same components
  • Components are versioned independently — updating a parser does not affect the chunker

Workers: AI Workflow Execution

Workers execute AI-powered workflows defined in the platform's visual workflow editor. Unlike Argo Workflows (which process documents), Workers orchestrate AI operations: LLM calls, vector search, multi-agent coordination, text-to-speech, and more.

How Workers Execute Jobs

Node Types

Workers support a rich set of node types organized by category:

Data Access — retrieve data from clusters:

NodeDescription
Vector SearchSemantic search across one or more clusters, with automatic result merging
Keyword SearchFull-text search via Meilisearch
Download EntryRetrieve document files via the platform proxy
Get EntriesFetch entry metadata in batch

LLM — interact with language models:

NodeDescription
Chat CompletionCall any supported LLM provider with configurable model, temperature, and prompt
Structured OutputLLM call with JSON schema enforcement
Multi-turn ConversationStateful conversation with context management

Document Processing — transform content:

NodeDescription
Text SplitterSplit text into chunks with configurable strategy
SummarizerGenerate summaries using LLM
TranslatorTranslate text between languages

Audio — voice and speech:

NodeDescription
Text-to-SpeechGenerate audio from text with voice selection

Research — specialized research tools:

NodeDescription
OpenAIRE SearchQuery the global research graph (600M+ products)
Citation AnalysisBuild citation networks and bibliometric profiles

Agents — multi-agent orchestration:

NodeDescription
Agent NodeHierarchical multi-agent execution with tool access
Group NodeComposable sub-workflow that dissolves into the parent DAG

System — workflow control:

NodeDescription
ConditionalBranch execution based on conditions
LoopIterate over collections
MergeCombine outputs from parallel branches

Multi-Provider AI Routing

Workers route LLM calls to the provider configured for each node. A single workflow can use different providers for different steps:

ProviderModelsUse Cases
OpenAIGPT-4o, GPT-4o-mini, o1, o3General-purpose reasoning, structured output
AnthropicClaude Sonnet, Claude OpusLong-context analysis, complex reasoning
MistralMistral Large, Mistral SmallEuropean data processing, multilingual tasks
GoogleGemini Pro, Gemini FlashCost-effective batch processing

Provider selection is per-node, not per-workflow. This enables cost optimization — use a smaller model for simple tasks and a more capable model for complex reasoning, within the same workflow.

The Visual Workflow Editor

The platform includes a visual editor for building AI workflows using a drag-and-drop canvas:

  • Nodes represent operations (LLM calls, search, data access, etc.)
  • Edges connect nodes to define data flow and dependencies
  • Parameters are configured per-node (model selection, prompts, search queries)
  • Template syntax allows nodes to reference outputs from upstream nodes
  • DAG execution ensures nodes run in the correct dependency order

The editor produces a workflow definition (nodes + edges as JSON) that Workers execute as a directed acyclic graph. The same definition can be run multiple times with different inputs.

Execution Characteristics

PropertyBehavior
Execution modelDAG with topological sort — nodes run as soon as all dependencies are satisfied
ParallelismIndependent branches execute concurrently
Cost trackingPer-node cost recorded (LLM tokens, API calls) with execution summary
Status streamingReal-time job status updates via server-sent events (SSE)
Context persistenceExecution context saved for debugging and replay
Error handlingPer-node retry with configurable policy; failed nodes can be inspected

How Processing Scales

Argo Workflows (Document Processing)

  • Concurrent workflows — the Argo controller manages multiple workflows simultaneously within configured limits
  • Dynamic parallelism — batch processing steps can fan out across multiple pods (e.g., processing 100 papers from an archive in parallel)
  • Per-workflow scratch space — each workflow gets dedicated temporary storage for intermediate artifacts
  • Resource limits — each step has defined CPU and memory limits to prevent resource contention

Workers (AI Workflows)

  • Queue-based scaling — KEDA monitors the job queue and scales worker pods based on pending job count
  • Concurrent execution — each worker pod processes multiple jobs simultaneously
  • Async I/O — network-bound operations (LLM API calls, proxy requests) use async execution to maximize throughput
  • Provider rate limits — workers respect per-provider rate limits and implement backoff strategies

Next Steps