mistralOcr
Runs Mistral OCR on a PDF document reachable via URL. Extracts the full text content, counts the pages, and collects any images found in the document. Images are returned as objects containing S3 paths. Actual API cost is tracked per-run.
Concurrency is capped at 3 simultaneous API calls. Retries up to 5 times with exponential backoff (base 2s, multiplier 2×, cap 32s).
Parameters
| Param | Type | Required | Description |
|---|---|---|---|
file_url | string (URL) | Yes | URL of the PDF document to process |
entry | object | Yes | Entry data dict containing dataset_id and entry_id for routing and cost tracking |
Output
| Field | Type | Description |
|---|---|---|
text | string | The full extracted text from the document |
images | ProcessedOCRImage[] | List of processed image objects (see below) |
numPages | integer | Number of pages in the document |
Each ProcessedOCRImage:
| Field | Type | Description |
|---|---|---|
image_url | string? | S3 path to the uploaded image |
image_type | string? | MIME type of the image (e.g. image/jpeg) |
width | integer? | Image width in pixels |
height | integer? | Image height in pixels |
note
The images output is intended to be forwarded as-is to a downstream saveEntries node.
Example
{
"id": "ocrNode",
"type": "mistralOcr",
"data": {
"label": "Mistral OCR",
"isExecuted": false,
"handles": ["inputs", "outputs"],
"schema": {},
"params": {
"file_url": { "value": "{{ @downloadEntry.file_url }}", "isExpression": true, "isAttachedToInputNode": false },
"entry": { "value": "{{ @fetchEntries }}", "isExpression": true, "isAttachedToInputNode": false }
},
"inputs": [], "outputs": [], "errors": []
},
"position": { "x": 300, "y": 0 },
"isSelected": false,
"isDragging": false
}