Skip to main content

DatasetListResponse

Paginated dataset list response

datasets object[]required

List of datasets

  • Array [
  • nameName (string)required

    Dataset name

    slugSlug (string)required

    URL-friendly slug

    descriptionDescription (string)required

    Dataset description

    dataset_typeDatasetType (string)required

    Type of dataset

    Possible values: [text, audio, voice, images]

    idId (integer)required

    Dataset ID (synced from backend catalog)

    size_bytesSize Bytes (integer)

    Total size in bytes

    Default value: 0
    entry_countEntry Count (integer)

    Number of entries (cached)

    Default value: 0
    schema_definition objectrequired

    Manifest-based schema definition

    schema_idSchema Id (string)required

    Unique schema identifier

    versionVersion (string)required

    Schema version (e.g., 'v3')

    descriptionDescription (string)required

    Human-readable schema description

    original objectrequired

    Original files schema

    required_filesstring[]

    Required file patterns

    optional_filesstring[]

    Optional file patterns

    metadata_schema object

    JSONSchema7 for metadata validation

    property name*any

    JSONSchema7 for metadata validation

    processed objectrequired

    Processed content schema

    content_schema object

    JSONSchema7 for content validation

    property name*any

    JSONSchema7 for content validation

    required_filesstring[]

    Required processed files

    optional_filesstring[]

    Optional processed files

    processing object

    Processing artifacts schema

    anyOf
    intermediate_filesstring[]

    Intermediate file patterns

    retention_daysRetention Days (integer)

    Days to retain processing artifacts

    Default value: 7
    current_schema_versionCurrent Schema Version (string)

    Current schema version. Entries can be migrated incrementally by comparing manifest->>'schema_version'

    Default value: v1
    storage_pathStorage Path (string)required

    Base storage path in MinIO/S3 (e.g., 'datasets/123')

    created_atstring<date-time>required

    Creation timestamp

    updated_atstring<date-time>required

    Last update timestamp

    last_synced_at object

    Last sync with backend catalog

    anyOf
    string<date-time>
    versionVersion (integer)

    Version number for optimistic locking

    Default value: 1
  • ]
  • totalTotal (integer)required

    Total number of datasets

    pagePage (integer)

    Current page number

    Default value: 1
    page_sizePage Size (integer)

    Page size

    Default value: 100
    DatasetListResponse
    {
    "datasets": [
    {
    "created_at": "2025-01-01T00:00:00Z",
    "current_schema_version": "v3",
    "dataset_type": "text",
    "description": "OCR processed academic papers from ArXiv",
    "entry_count": 1500,
    "id": 123,
    "last_synced_at": "2025-01-10T00:00:00Z",
    "name": "ArXiv Papers OCR",
    "schema_definition": {
    "description": "Schema for ArXiv papers with OCR, chunking, and embeddings",
    "original": {
    "metadata_schema": {
    "properties": {
    "title": {
    "type": "string"
    },
    "authors": {
    "items": {
    "type": "string"
    },
    "type": "array"
    },
    "arxiv_id": {
    "type": "string"
    },
    "published_date": {
    "format": "date",
    "type": "string"
    }
    },
    "required": [
    "title",
    "arxiv_id"
    ],
    "type": "object"
    },
    "optional_files": [
    "thumbnail.jpg"
    ],
    "required_files": [
    "paper.pdf"
    ]
    },
    "processed": {
    "content_schema": {
    "properties": {
    "text": {
    "type": "string"
    },
    "chunks": {
    "type": "array"
    },
    "figures": {
    "type": "array"
    }
    },
    "required": [
    "text",
    "chunks"
    ],
    "type": "object"
    },
    "optional_files": [
    "figures/*.png"
    ],
    "required_files": [
    "content.json"
    ]
    },
    "processing": {
    "intermediate_files": [
    "embeddings.npy",
    "chunks.json"
    ],
    "retention_days": 7
    },
    "schema_id": "arxiv_papers_ocr",
    "version": "v3"
    },
    "size_bytes": 10485760,
    "slug": "arxiv-papers-ocr",
    "storage_path": "datasets/123",
    "updated_at": "2025-01-05T00:00:00Z"
    }
    ],
    "total": 0,
    "page": 1,
    "page_size": 100
    }