Scope & Non-Goals
What SDF is
Section titled “What SDF is”SDF is a data interchange format. It defines how to represent pre-extracted semantic content from web pages as structured JSON so that any consumer — AI agent, search engine, analytics pipeline, or application — can read it without re-parsing the original source.
Specifically, SDF defines:
- A document schema with required and optional fields
- A hierarchical type system (10 parent types, 50+ subtypes) that determines which structured fields apply
- Provenance metadata so consumers know how and when a document was produced
- Content negotiation for serving different resolution levels (compact, standard, full)
- Discovery mechanisms (
/.well-known/sdf.json) so agents can find SDF endpoints
What SDF is NOT
Section titled “What SDF is NOT”These boundaries are intentional and will not change in future protocol versions.
Not an agent framework
Section titled “Not an agent framework”SDF contains no agent instructions, tool definitions, function calls, or execution semantics. It does not tell agents what to do — it gives them structured data to reason about. How an agent uses SDF is entirely outside the protocol’s scope.
Not a UI specification
Section titled “Not a UI specification”SDF documents contain no layout hints, rendering instructions, CSS classes, or display metadata. SDF is machine-first by design. If you need to render content for humans, use the original HTML.
Not a replacement for HTML
Section titled “Not a replacement for HTML”SDF is a complementary layer, not a competitor to HTML, RSS, Atom, or any existing web standard. Publishers continue to serve HTML to browsers. SDF serves the same content in a pre-extracted, machine-optimized form alongside it.
Not a business logic layer
Section titled “Not a business logic layer”SDF does not encode pricing rules, access control, rate limits, authentication, or API behavior. These are transport-layer concerns handled by the server, not the document format.
Not a model specification
Section titled “Not a model specification”SDF does not require any particular LLM, embedding model, or NLP pipeline. The protocol is model-agnostic. Any system capable of producing valid SDF JSON — whether an LLM, a rule-based extractor, or a human — can be an SDF converter.
Not a proprietary format
Section titled “Not a proprietary format”SDF Core will never contain vendor-specific fields, required proprietary extensions, or features that only work with a particular provider’s tooling. The extensions namespace exists for vendor-specific additions, but Core remains neutral.
Design principles
Section titled “Design principles”| Principle | Implication |
|---|---|
| Machine-first | Every design decision optimizes for programmatic consumption, not human reading |
| Schema-validated | If it doesn’t pass JSON Schema validation, it’s not SDF |
| Transport-agnostic | SDF is a JSON payload; how you deliver it (HTTP, file, message queue) is orthogonal |
| Model-agnostic | No assumption about which LLM or extractor produces the document |
| Minimal Core | New fields require clear justification; the default answer to “should we add X?” is no |