Skip to content

Well-Known Discovery

Publishers that serve SDF documents advertise their support via a well-known endpoint:

https://example.com/.well-known/sdf.json

This allows agents to discover SDF support, locate endpoints, and understand publisher policies without prior configuration.

The well-known endpoint must:

  1. Be served at exactly /.well-known/sdf.json
  2. Return Content-Type: application/json
  3. Return HTTP 200 with a valid configuration object
  4. Be publicly accessible (no authentication required for discovery)
FieldTypeRequiredDescription
sdf_versionstringYesSupported SDF protocol version
publisherobjectYesPublisher identity
endpointsarray<object>YesAvailable SDF endpoints
types_supportedarray<string>NoParent types this publisher serves
resolutionsarray<string>NoSupported resolution levels
policiesobjectNoPublisher policies
extensionsarray<string>NoSupported extension namespaces
{
"publisher": {
"name": "Example News",
"domain": "example.com",
"contact": "sdf@example.com"
}
}

Declare the available SDF endpoints and their capabilities:

{
"endpoints": [
{
"path": "/api/sdf/{url}",
"method": "GET",
"description": "Convert any page on this domain to SDF",
"rate_limit": "100/hour",
"auth_required": false
},
{
"path": "/api/sdf/batch",
"method": "POST",
"description": "Batch conversion of multiple URLs",
"rate_limit": "10/hour",
"auth_required": true
}
]
}

Publisher policies govern caching, authentication, and usage:

{
"policies": {
"cache_ttl": 3600,
"require_auth": false,
"rate_limit": "100/hour",
"max_batch_size": 50,
"attribution_required": true,
"commercial_use": true
}
}
FieldTypeDescription
cache_ttlintegerRecommended cache TTL in seconds
require_authbooleanWhether authentication is required for SDF endpoints
rate_limitstringRate limit description
max_batch_sizeintegerMaximum URLs per batch request
attribution_requiredbooleanWhether consumers must attribute the source
commercial_usebooleanWhether commercial use is permitted
/.well-known/sdf.json
{
"sdf_version": "0.2.0",
"publisher": {
"name": "Example News",
"domain": "example.com",
"contact": "sdf@example.com"
},
"endpoints": [
{
"path": "/api/sdf/{url}",
"method": "GET",
"description": "Convert any page on this domain to SDF",
"rate_limit": "100/hour",
"auth_required": false
},
{
"path": "/api/sdf/batch",
"method": "POST",
"description": "Batch conversion (authenticated)",
"rate_limit": "10/hour",
"auth_required": true
}
],
"types_supported": [
"article",
"media",
"profile"
],
"resolutions": [
"compact",
"standard",
"full"
],
"policies": {
"cache_ttl": 3600,
"require_auth": false,
"rate_limit": "100/hour",
"max_batch_size": 50,
"attribution_required": true,
"commercial_use": true
},
"extensions": [
"x-example-analytics",
"x-example-internal"
]
}
  1. Agent wants to consume https://example.com/some-article
  2. Agent fetches https://example.com/.well-known/sdf.json
  3. If 200: parse configuration, use declared endpoint to request SDF document
  4. If 404: publisher does not support SDF; agent must fall back to HTML extraction
  5. Agent respects declared policies (rate limits, authentication, attribution)

Agents should cache the well-known configuration. Recommended behavior:

  • Cache for the cache_ttl specified in policies, or 1 hour if not specified
  • Re-fetch on HTTP 410 (Gone) to detect SDF support removal
  • Respect standard HTTP cache headers (Cache-Control, ETag) if present