Skip to content

Protocol v0.2

Draft v0.2.0

SDF Protocol v0.2.0 is a draft specification. The protocol is under active development. Breaking changes may occur between minor versions until v1.0.

GoalDescription
Agent-firstEvery design decision optimizes for machine consumption, not human reading
Schema-validatedAll documents must pass JSON Schema validation (draft 2020-12)
Type-awareHierarchical type system with parent types and subtypes drives field selection
Provenance-completeEvery document carries full conversion audit trail
CacheableContent hashing enables deduplication and incremental updates
ExtensibleVendor namespaces allow domain-specific extensions without protocol changes
Resolution-flexibleCompact, standard, and full resolution levels serve different consumer needs
Transport-agnosticSDF documents are JSON payloads; delivery mechanism is orthogonal

Every SDF document begins with four required header fields:

{
"sdf_version": "0.2.0",
"id": "sdf_a1b2c3d4e5f6",
"parent_type": "article",
"type": "article.news"
}
FieldTypeRequiredDescription
sdf_versionstringYesSemantic version of the SDF protocol used to produce this document
idstringYesUnique document identifier, prefixed with sdf_
parent_typestringYesOne of 10 parent types (e.g., article, commerce, code)
typestringYesFully qualified type as parent_type.subtype (e.g., article.news)

The parent_type field determines which type_data schema applies. The type field provides finer-grained classification within that parent type.

See Type System for the complete taxonomy.

Beyond the header, an SDF document contains these top-level sections:

  • source — Origin metadata (URL, domain, fetch timestamp)
  • summary — Multi-level summarization (one-line, key points, abstract)
  • entities — Named entities with types, roles, and salience
  • claims — Factual assertions with confidence and attribution
  • topics — Topic classification labels
  • relationships — Subject-predicate-object triples
  • type_data — Type-specific structured fields
  • sections — Document structure (headings, content blocks)
  • metadata — Page-level metadata (title, description, keywords)
  • provenance — Conversion audit trail
  • temporal — Time-related metadata
  • links — Outbound link analysis
  • embeddings — Optional vector representations
  • extensions — Vendor-namespaced custom fields

See Document Model for complete field definitions.

  • Added aspects field for multi-type content
  • Introduced type normalization cascade (5 stages)
  • Added content_hash to provenance for deduplication
  • Expanded type taxonomy to 50+ subtypes
  • Added extensions field with vendor namespace support
  • Introduced resolution levels (compact, standard, full)
  • Added /.well-known/sdf.json discovery mechanism
  • Content negotiation via Accept: application/sdf+json
  • Initial protocol definition
  • 10 parent types with basic subtypes
  • Core document model: source, summary, entities, claims, relationships
  • Provenance tracking
  • JSON Schema validation

Document Model

Complete field reference for SDF documents. See Document Model.

Type System

Hierarchical type taxonomy with 10 parent types and 50+ subtypes. See Type System.