Skip to main content

DatasetSchemaDefinition

Dataset schema definition for manifest-based storage

schema_idSchema Id (string)required

Unique schema identifier

versionVersion (string)required

Schema version (e.g., 'v3')

descriptionDescription (string)required

Human-readable schema description

original objectrequired

Original files schema

required_filesstring[]

Required file patterns

optional_filesstring[]

Optional file patterns

metadata_schema object

JSONSchema7 for metadata validation

property name*any

JSONSchema7 for metadata validation

processed objectrequired

Processed content schema

content_schema object

JSONSchema7 for content validation

property name*any

JSONSchema7 for content validation

required_filesstring[]

Required processed files

optional_filesstring[]

Optional processed files

processing object

Processing artifacts schema

anyOf
intermediate_filesstring[]

Intermediate file patterns

retention_daysRetention Days (integer)

Days to retain processing artifacts

Default value: 7
DatasetSchemaDefinition
{
"schema_id": "string",
"version": "string",
"description": "string",
"original": {
"required_files": [
"string"
],
"optional_files": [
"string"
],
"metadata_schema": {}
},
"processed": {
"content_schema": {},
"required_files": [
"string"
],
"optional_files": [
"string"
]
},
"processing": {
"intermediate_files": [
"string"
],
"retention_days": 7
}
}