Skip to content

Scope & Non-Goals

SDF is a data interchange format. It defines how to represent pre-extracted semantic content from web pages as structured JSON so that any consumer — AI agent, search engine, analytics pipeline, or application — can read it without re-parsing the original source.

Specifically, SDF defines:

  • A document schema with required and optional fields
  • A hierarchical type system (10 parent types, 50+ subtypes) that determines which structured fields apply
  • Provenance metadata so consumers know how and when a document was produced
  • Content negotiation for serving different resolution levels (compact, standard, full)
  • Discovery mechanisms (/.well-known/sdf.json) so agents can find SDF endpoints

These boundaries are intentional and will not change in future protocol versions.

SDF contains no agent instructions, tool definitions, function calls, or execution semantics. It does not tell agents what to do — it gives them structured data to reason about. How an agent uses SDF is entirely outside the protocol’s scope.

SDF documents contain no layout hints, rendering instructions, CSS classes, or display metadata. SDF is machine-first by design. If you need to render content for humans, use the original HTML.

SDF is a complementary layer, not a competitor to HTML, RSS, Atom, or any existing web standard. Publishers continue to serve HTML to browsers. SDF serves the same content in a pre-extracted, machine-optimized form alongside it.

SDF does not encode pricing rules, access control, rate limits, authentication, or API behavior. These are transport-layer concerns handled by the server, not the document format.

SDF does not require any particular LLM, embedding model, or NLP pipeline. The protocol is model-agnostic. Any system capable of producing valid SDF JSON — whether an LLM, a rule-based extractor, or a human — can be an SDF converter.

SDF Core will never contain vendor-specific fields, required proprietary extensions, or features that only work with a particular provider’s tooling. The extensions namespace exists for vendor-specific additions, but Core remains neutral.

PrincipleImplication
Machine-firstEvery design decision optimizes for programmatic consumption, not human reading
Schema-validatedIf it doesn’t pass JSON Schema validation, it’s not SDF
Transport-agnosticSDF is a JSON payload; how you deliver it (HTTP, file, message queue) is orthogonal
Model-agnosticNo assumption about which LLM or extractor produces the document
Minimal CoreNew fields require clear justification; the default answer to “should we add X?” is no