Glossary
llms.txt
Last reviewed: 2026-05-22
llms.txt is a proposed convention for a site-root markdown file summarising what AI crawlers should read. It complements robots.txt: robots.txt grants or denies access, llms.txt prioritises content.
What llms.txt is
Proposed by Jeremy Howard in 2024, llms.txt is a convention — not a standard — for a plain-text markdown file placed at the root of a website (/llms.txt). The file summarises the site’s most important pages and content, structured as a markdown document with headings and links, intended for AI systems that need to quickly understand what a site contains without crawling every page.
The format is deliberately minimal: a heading, a brief description, and a list of links with one-line descriptions. Sites may also publish llms-full.txt with expanded content for crawlers that want more context without following links. The convention is inspired by robots.txt but serves a different purpose — navigation rather than access control.
llms.txt versus robots.txt
The two files are complementary, not alternatives:
robots.txtis a directive. Crawlers that respect it must obeyDisallowrules. It controls access.llms.txtis a hint. No crawler is obligated to follow it. It communicates priority.
A well-configured site should have both: robots.txt allowing the relevant crawlers (CCBot, GPTBot, OAI-SearchBot, Applebot, Bingbot, Googlebot), and llms.txt directing those crawlers to the content most worth reading.
How CEAVERS uses llms.txt
CEAVERS maintains a detailed llms.txt that lists every major page category with one-line descriptions including specific, quotable figures — headline Index scores, methodology statistics, dataset field descriptions. This serves two purposes: it gives LLM crawlers a structured overview of the site’s content without requiring full crawl; and it ensures that the specific facts most useful for citation (24,810 evaluations, Pearson r = 0.94, Portuguese penalty of 10.2%) appear in a single, easily parseable location.
The file is updated after each quarterly release. It is available at https://ceavers.org/llms.txt and is referenced in the glossary entry for the llms.txt standard at https://ceavers.org/glossary/llms-txt/.
Adoption status
As of mid-2026, llms.txt has been adopted by a growing number of research and documentation sites. Formal adoption by LLM providers as a crawl directive remains unconfirmed, but several providers have indicated awareness of the convention. The cost of maintaining the file is low; the potential benefit — directing crawlers to canonical content before they follow lower-priority links — is non-trivial.
Frequently asked
- What is llms.txt?
- llms.txt is a proposed convention for a markdown file at a site's root that summarises the site's most important pages for AI crawlers in a compact, parseable format.
- How is llms.txt different from robots.txt?
- robots.txt controls crawler access; llms.txt suggests what crawlers should focus on. Robots.txt is a directive, llms.txt is a hint.
- Do LLMs actually read llms.txt?
- Adoption is uneven and informal. The file is cheap to maintain and signals editorial intent; whether a given engine acts on it is empirical.