Skip to main content

SaveProcessedContentResponse

Response after saving processed content

entry_idEntry Id (integer)required

Entry ID

manifest objectrequired

Updated manifest

schema_versionSchema Version (string)required

Schema version (e.g., 'v3')

dataset_schema_idDataset Schema Id (string)required

Dataset schema identifier (e.g., 'arxiv_papers_ocr')

original object

Original files section

anyOf
files object[]

List of original files

  • Array [
  • keyKey (string)required

    S3 key for the file

    sizeSize (integer)required

    File size in bytes

    mime_typeMime Type (string)required

    MIME type of the file

    hash object

    SHA256 hash of the file

    anyOf
    string
    created_at object

    File creation timestamp

    anyOf
    string<date-time>
    expires_at object

    File expiration timestamp (for processing artifacts)

    anyOf
    string<date-time>
  • ]
  • metadata object

    Original metadata (title, author, etc.)

    property name*any

    Original metadata (title, author, etc.)

    processing object

    Processing artifacts section

    anyOf
    steps_completedstring[]

    List of completed processing steps

    files object[]

    Intermediate processing files

  • Array [
  • keyKey (string)required

    S3 key for the file

    sizeSize (integer)required

    File size in bytes

    mime_typeMime Type (string)required

    MIME type of the file

    hash object

    SHA256 hash of the file

    anyOf
    string
    created_at object

    File creation timestamp

    anyOf
    string<date-time>
    expires_at object

    File expiration timestamp (for processing artifacts)

    anyOf
    string<date-time>
  • ]
  • processed object

    Processed content section

    anyOf
    content_key object

    S3 key for main content.json file

    anyOf
    string
    size object

    Size of content.json in bytes

    anyOf
    integer
    fields_summary object

    Quick stats for UI (text_length, chunk_count, etc.)

    property name*any

    Quick stats for UI (text_length, chunk_count, etc.)

    completed_at object

    Processing completion timestamp

    anyOf
    string<date-time>
    additional_files object

    Additional processed files (figures, etc.)

    anyOf
  • Array [
  • keyKey (string)required

    S3 key for the file

    sizeSize (integer)required

    File size in bytes

    mime_typeMime Type (string)required

    MIME type of the file

    hash object

    SHA256 hash of the file

    anyOf
    string
    created_at object

    File creation timestamp

    anyOf
    string<date-time>
    expires_at object

    File expiration timestamp (for processing artifacts)

    anyOf
    string<date-time>
  • ]
  • full_manifest_key object

    S3 key if manifest >5KB (stored externally)

    anyOf
    string
    successSuccess (boolean)

    Operation success

    Default value: true
    SaveProcessedContentResponse
    {
    "entry_id": 0,
    "manifest": {
    "dataset_schema_id": "arxiv_papers_ocr",
    "original": {
    "files": [
    {
    "created_at": "2025-11-04T10:00:00Z",
    "hash": "sha256:abc123...",
    "key": "datasets/123/entries/456/original/paper.pdf",
    "mime_type": "application/pdf",
    "size": 2048000
    },
    {
    "key": "datasets/123/entries/456/original/thumbnail.jpg",
    "mime_type": "image/jpeg",
    "size": 50000
    }
    ],
    "metadata": {
    "arxiv_id": "2024.12345",
    "authors": [
    "John Doe",
    "Jane Smith"
    ],
    "published_date": "2024-11-01",
    "title": "Deep Learning for Computer Vision"
    }
    },
    "processed": {
    "additional_files": [
    {
    "key": "datasets/123/entries/456/processed/figures/fig_001.png",
    "mime_type": "image/png",
    "size": 80000
    }
    ],
    "completed_at": "2025-11-04T10:30:00Z",
    "content_key": "datasets/123/entries/456/processed/content.json",
    "fields_summary": {
    "chunk_count": 120,
    "figure_count": 8,
    "text_length": 45000
    },
    "size": 150000
    },
    "processing": {
    "files": [
    {
    "expires_at": "2025-12-04T10:00:00Z",
    "key": "datasets/123/entries/456/processing/embeddings.npy",
    "mime_type": "application/octet-stream",
    "size": 200000
    }
    ],
    "steps_completed": [
    "ocr",
    "chunking",
    "embedding"
    ]
    },
    "schema_version": "v3"
    },
    "success": true
    }