DatasetListResponse
Paginated dataset list response
datasets object[]required
List of datasets
Dataset name
URL-friendly slug
Dataset description
Type of dataset
Possible values: [text, audio, voice, images]
Dataset ID (synced from backend catalog)
Total size in bytes
0Number of entries (cached)
0schema_definition objectrequired
Manifest-based schema definition
Unique schema identifier
Schema version (e.g., 'v3')
Human-readable schema description
original objectrequired
Original files schema
Required file patterns
Optional file patterns
metadata_schema object
JSONSchema7 for metadata validation
JSONSchema7 for metadata validation
processed objectrequired
Processed content schema
content_schema object
JSONSchema7 for content validation
JSONSchema7 for content validation
Required processed files
Optional processed files
processing object
Processing artifacts schema
- DatasetSchemaProcessing
- null
Intermediate file patterns
Days to retain processing artifacts
7Current schema version. Entries can be migrated incrementally by comparing manifest->>'schema_version'
v1Base storage path in MinIO/S3 (e.g., 'datasets/123')
Creation timestamp
Last update timestamp
last_synced_at object
Last sync with backend catalog
- string<date-time>
- null
Version number for optimistic locking
1Total number of datasets
Current page number
1Page size
100{
"datasets": [
{
"created_at": "2025-01-01T00:00:00Z",
"current_schema_version": "v3",
"dataset_type": "text",
"description": "OCR processed academic papers from ArXiv",
"entry_count": 1500,
"id": 123,
"last_synced_at": "2025-01-10T00:00:00Z",
"name": "ArXiv Papers OCR",
"schema_definition": {
"description": "Schema for ArXiv papers with OCR, chunking, and embeddings",
"original": {
"metadata_schema": {
"properties": {
"title": {
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"type": "array"
},
"arxiv_id": {
"type": "string"
},
"published_date": {
"format": "date",
"type": "string"
}
},
"required": [
"title",
"arxiv_id"
],
"type": "object"
},
"optional_files": [
"thumbnail.jpg"
],
"required_files": [
"paper.pdf"
]
},
"processed": {
"content_schema": {
"properties": {
"text": {
"type": "string"
},
"chunks": {
"type": "array"
},
"figures": {
"type": "array"
}
},
"required": [
"text",
"chunks"
],
"type": "object"
},
"optional_files": [
"figures/*.png"
],
"required_files": [
"content.json"
]
},
"processing": {
"intermediate_files": [
"embeddings.npy",
"chunks.json"
],
"retention_days": 7
},
"schema_id": "arxiv_papers_ocr",
"version": "v3"
},
"size_bytes": 10485760,
"slug": "arxiv-papers-ocr",
"storage_path": "datasets/123",
"updated_at": "2025-01-05T00:00:00Z"
}
],
"total": 0,
"page": 1,
"page_size": 100
}