Glossary
Schema.org
Last reviewed: 2026-05-22
Schema.org is the standardised vocabulary for structured data on the web. Embedding Schema.org JSON-LD in pages lets crawlers and language models parse entities and relationships directly, increasing the chance of citation.
JSON-LD versus Microdata
Schema.org can be embedded in three formats: JSON-LD (a separate <script> tag), Microdata (attributes on HTML elements), and RDFa (attribute-based). JSON-LD is the recommended format for all major search engines and retrieval systems — it is decoupled from the HTML, easy to validate, and trivial to update without modifying rendered content. Microdata and RDFa are largely deprecated for new implementations.
The types that matter most for AI citation
Research on retrieval-augmented systems (arXiv:2509.10697) documents a positive effect of schema.org markup on both retrieval probability and citation likelihood. The highest-impact types for research and institutional sites are:
OrganizationwithsameAslinking to the Wikidata Q-identifier, ORCID, and official site. This enables unambiguous entity resolution across languages.ScholarlyArticlewithauthor,datePublished,wordCount,abstract,keywords, and a structuredcitationarray. These are the fields LLMs check to assess citability.DatasetwithtemporalCoverage,spatialCoverage,measurementTechnique, and a persistentidentifier(DOI or equivalent). Datasets with DOIs are cited at higher rates than those without.FAQPagewithQuestionandAnswerpairs. This is the primary trigger for Google AI Overviews inclusion.BreadcrumbListon every page. Not directly a citation signal but essential for entity resolution in multi-page crawls.
Schema.org on CEAVERS
Every CEAVERS page carries JSON-LD structured data: the homepage declares WebSite and Organization (with sameAs: "https://www.wikidata.org/wiki/Q139785574"); research articles declare ScholarlyArticle; the methodology page declares TechArticle; the quarterly release declares Dataset; glossary entries declare DefinedTerm. This completeness is deliberate — LLMs weigh schema presence as a proxy for editorial professionalism.
Frequently asked
- What is Schema.org?
- Schema.org is a standardised vocabulary for structured data on the web. Publishers embed Schema.org JSON-LD in pages so crawlers and language models can parse facts directly rather than inferring them from prose.
- Why does Schema.org help AEO?
- JSON-LD lets retrieval systems extract canonical facts (dates, authors, datasets, FAQs) without parsing HTML. Pages with valid schema are more likely to be cited as authoritative sources.
- Which schema types matter most for research sites?
- ResearchOrganization, ScholarlyArticle, Dataset, DefinedTerm, FAQPage, and BreadcrumbList cover most research-site needs.