mistralOcr

Runs Mistral OCR on a PDF document reachable via URL. Extracts the full text content, counts the pages, and collects any images found in the document. Images are returned as objects containing S3 paths. Actual API cost is tracked per-run.

Concurrency is capped at 3 simultaneous API calls. Retries up to 5 times with exponential backoff (base 2s, multiplier 2×, cap 32s).

Parameters

Param	Type	Required	Description
`file_url`	string (URL)	Yes	URL of the PDF document to process
`entry`	object	Yes	Entry data dict containing `dataset_id` and `entry_id` for routing and cost tracking

Output

Field	Type	Description
`text`	string	The full extracted text from the document
`images`	`ProcessedOCRImage[]`	List of processed image objects (see below)
`numPages`	integer	Number of pages in the document

Each ProcessedOCRImage:

Field	Type	Description
`image_url`	string?	S3 path to the uploaded image
`image_type`	string?	MIME type of the image (e.g. `image/jpeg`)
`width`	integer?	Image width in pixels
`height`	integer?	Image height in pixels

note

The images output is intended to be forwarded as-is to a downstream saveEntries node.

Example

{
  "id": "ocrNode",
  "type": "mistralOcr",
  "data": {
    "label": "Mistral OCR",
    "isExecuted": false,
    "handles": ["inputs", "outputs"],
    "schema": {},
    "params": {
      "file_url": { "value": "{{ @downloadEntry.file_url }}", "isExpression": true, "isAttachedToInputNode": false },
      "entry": { "value": "{{ @fetchEntries }}", "isExpression": true, "isAttachedToInputNode": false }
    },
    "inputs": [], "outputs": [], "errors": []
  },
  "position": { "x": 300, "y": 0 },
  "isSelected": false,
  "isDragging": false
}

Parameters​

Output​

Example​

Parameters

Output

Example