Skip to content

SDF Document Schema

The SDF document schema (sdf-document.schema.json) defines the structural requirements for all SDF documents. It is written in JSON Schema draft 2020-12.

Schema header
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://sdfprotocol.org/schemas/sdf-document.schema.json",
"title": "SDF Document",
"description": "Schema for Structured Data Format (SDF) documents",
"type": "object"
}
Required fields
{
"required": [
"sdf_version",
"id",
"parent_type",
"type",
"source",
"summary",
"entities",
"type_data",
"provenance"
]
}
Property definitions
{
"properties": {
"sdf_version": {
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+$",
"description": "SDF protocol version (semver)"
},
"id": {
"type": "string",
"pattern": "^sdf_",
"description": "Unique document identifier, prefixed with sdf_"
},
"parent_type": {
"type": "string",
"enum": [
"article",
"documentation",
"commerce",
"discussion",
"reference",
"data",
"code",
"profile",
"event",
"media"
],
"description": "Primary content type classification"
},
"type": {
"type": "string",
"pattern": "^(article|documentation|commerce|discussion|reference|data|code|profile|event|media)\\.",
"description": "Qualified type as parent_type.subtype"
},
"aspects": {
"type": "array",
"items": { "type": "string" },
"description": "Secondary type classifications for multi-type content"
},
"source": {
"$ref": "common/source.schema.json"
},
"summary": {
"$ref": "common/summary.schema.json"
},
"entities": {
"type": "array",
"items": { "$ref": "common/entity.schema.json" },
"description": "Extracted named entities"
},
"claims": {
"type": "array",
"items": { "$ref": "common/claim.schema.json" },
"description": "Factual assertions"
},
"topics": {
"type": "array",
"items": { "type": "string" },
"description": "Topic classification labels"
},
"relationships": {
"type": "array",
"items": { "$ref": "common/relationship.schema.json" },
"description": "Entity relationship triples"
},
"type_data": {
"type": "object",
"description": "Type-specific structured fields"
},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"heading": { "type": "string" },
"level": { "type": "integer", "minimum": 1, "maximum": 6 },
"content": { "type": "string" },
"word_count": { "type": "integer" }
}
},
"description": "Document structural sections"
},
"metadata": {
"type": "object",
"description": "Page-level metadata"
},
"provenance": {
"$ref": "common/provenance.schema.json"
},
"temporal": {
"$ref": "common/temporal.schema.json"
},
"links": {
"type": "array",
"items": { "$ref": "common/link.schema.json" },
"description": "Outbound link analysis"
},
"embeddings": {
"type": "object",
"description": "Optional vector representations"
},
"extensions": {
"type": "object",
"patternProperties": {
"^x-": { "type": "object" }
},
"additionalProperties": false,
"description": "Vendor-namespaced custom fields"
}
}
}
FieldJSON TypeConstraintNotes
sdf_versionstringSemver pattern ^\d+\.\d+\.\d+$Must match "0.2.0" for current spec
idstringPrefix ^sdf_Unique across documents
parent_typestringEnum of 10 valuesDetermines type_data schema
typestringPattern ^{parent_type}\.Must start with parent_type
aspectsarray of stringsOptional secondary types
sourceobjectRef: source schemaURL, domain, timestamp required
summaryobjectRef: summary schemaone_line and key_points required
entitiesarray of objectsRef: entity schemaname and type required per entity
claimsarray of objectsRef: claim schemaclaim, source_type, confidence required
topicsarray of stringsFree-text topic labels
relationshipsarray of objectsRef: relationship schemasubject, predicate, object required
type_dataobjectConditional by parent_typeType-specific validation
sectionsarray of objectsheading, level, content per section
metadataobjectFree-form page metadata
provenanceobjectRef: provenance schemaconverter, model, content_hash required
temporalobjectRef: temporal schemaISO 8601 date fields
linksarray of objectsRef: link schemaurl and relationship required
embeddingsobjectModel, dimensions, vectors
extensionsobjectKeys must match ^x-Vendor-namespaced only

The root schema applies type-specific validation to type_data using JSON Schema conditional composition:

{
"allOf": [
{
"if": { "properties": { "parent_type": { "const": "article" } } },
"then": { "properties": { "type_data": { "$ref": "types/article.schema.json" } } }
},
{
"if": { "properties": { "parent_type": { "const": "commerce" } } },
"then": { "properties": { "type_data": { "$ref": "types/commerce.schema.json" } } }
}
]
}

This pattern ensures that type_data for an article document is validated against the article schema, type_data for a commerce document is validated against the commerce schema, and so on.

The complete schema files are maintained in the SDF GitHub repository:

github.com/sdfprotocol

Schema files are versioned alongside the protocol specification. Each protocol version has a corresponding set of schema files.