Document Model
Complete field reference for SDF documents. See Document Model.
SDF Protocol v0.2.0 is a draft specification. The protocol is under active development. Breaking changes may occur between minor versions until v1.0.
| Goal | Description |
|---|---|
| Agent-first | Every design decision optimizes for machine consumption, not human reading |
| Schema-validated | All documents must pass JSON Schema validation (draft 2020-12) |
| Type-aware | Hierarchical type system with parent types and subtypes drives field selection |
| Provenance-complete | Every document carries full conversion audit trail |
| Cacheable | Content hashing enables deduplication and incremental updates |
| Extensible | Vendor namespaces allow domain-specific extensions without protocol changes |
| Resolution-flexible | Compact, standard, and full resolution levels serve different consumer needs |
| Transport-agnostic | SDF documents are JSON payloads; delivery mechanism is orthogonal |
Every SDF document begins with four required header fields:
{ "sdf_version": "0.2.0", "id": "sdf_a1b2c3d4e5f6", "parent_type": "article", "type": "article.news"}| Field | Type | Required | Description |
|---|---|---|---|
sdf_version | string | Yes | Semantic version of the SDF protocol used to produce this document |
id | string | Yes | Unique document identifier, prefixed with sdf_ |
parent_type | string | Yes | One of 10 parent types (e.g., article, commerce, code) |
type | string | Yes | Fully qualified type as parent_type.subtype (e.g., article.news) |
The parent_type field determines which type_data schema applies. The type field provides finer-grained classification within that parent type.
See Type System for the complete taxonomy.
Beyond the header, an SDF document contains these top-level sections:
source — Origin metadata (URL, domain, fetch timestamp)summary — Multi-level summarization (one-line, key points, abstract)entities — Named entities with types, roles, and salienceclaims — Factual assertions with confidence and attributiontopics — Topic classification labelsrelationships — Subject-predicate-object triplestype_data — Type-specific structured fieldssections — Document structure (headings, content blocks)metadata — Page-level metadata (title, description, keywords)provenance — Conversion audit trailtemporal — Time-related metadatalinks — Outbound link analysisembeddings — Optional vector representationsextensions — Vendor-namespaced custom fieldsSee Document Model for complete field definitions.
aspects field for multi-type contentcontent_hash to provenance for deduplicationextensions field with vendor namespace support/.well-known/sdf.json discovery mechanismAccept: application/sdf+jsonDocument Model
Complete field reference for SDF documents. See Document Model.
Type System
Hierarchical type taxonomy with 10 parent types and 50+ subtypes. See Type System.