Skip to content

Type Taxonomy

SDF classifies every document using a two-level hierarchy: parent_type.subtype. There are 10 parent types and 50+ subtypes covering the full range of web content.

In production deployment across 2,335 documents, 74 unique type combinations were observed. The type normalization cascade corrected 63 non-canonical type inventions to achieve 100% taxonomy conformance.

article
├── news
├── blog
├── opinion
├── review
├── analysis
└── press_release
documentation
├── tutorial
├── api_reference
├── guide
├── faq
├── changelog
└── troubleshooting
commerce
├── product
├── category
├── comparison
├── marketplace
├── service
└── pricing
discussion
├── forum
├── qa
├── comment_thread
├── review_thread
└── poll
reference
├── encyclopedia
├── dictionary
├── legal
├── academic
├── specification
└── wiki
data
├── dataset
├── statistics
├── report
├── dashboard
├── financial
└── scientific
code
├── repository
├── package
├── snippet
├── gist
├── notebook
└── documentation
profile
├── person
├── organization
├── place
├── product_profile
└── portfolio
event
├── conference
├── meetup
├── webinar
├── concert
├── sports
└── workshop
media
├── video
├── podcast
├── music
├── image_gallery
├── livestream
└── animation
Parent TypeSubtypesDescriptiontype_data Reference
articlenews, blog, opinion, review, analysis, press_releaseAuthored content for publicationarticle
documentationtutorial, api_reference, guide, faq, changelog, troubleshootingTechnical / instructional contentdocumentation
commerceproduct, category, comparison, marketplace, service, pricingProduct and service listingscommerce
discussionforum, qa, comment_thread, review_thread, pollCommunity Q&A and forumsdiscussion
referenceencyclopedia, dictionary, legal, academic, specification, wikiEncyclopedic and reference materialreference
datadataset, statistics, report, dashboard, financial, scientificData-centric contentdata
coderepository, package, snippet, gist, notebook, documentationSoftware and code contentcode
profileperson, organization, place, product_profile, portfolioEntity profilesprofile
eventconference, meetup, webinar, concert, sports, workshopEvent listings and schedulesevent
mediavideo, podcast, music, image_gallery, livestream, animationAudio/video/multimediamedia

From production deployment over 2,335 documents:

MetricValue
Total documents processed2,335
Unique parent types observed10 (all)
Unique type combinations observed74
Non-canonical types corrected63
Taxonomy conformance after normalization100%

The most frequently observed types in production:

TypeCountPercentage
article.news48720.9%
article.blog31213.4%
documentation.guide1988.5%
commerce.product1767.5%
reference.wiki1546.6%
code.repository1436.1%
discussion.qa1285.5%
Other types73731.5%

Type System Spec

See Type System for the normalization cascade and aspect system.

Document Model

See Document Model for how type_data fits into the document structure.