6 LLMs  ·  5 languages  ·  Quarterly index  ·  Independent research  ·  Updated Q2 2026
CEAVERS
Centre for European AI Visibility Evaluation & Research Standards

Glossary

llms.txt

Last reviewed: 2026-05-22

llms.txt is a proposed convention for a site-root markdown file summarising what AI crawlers should read. It complements robots.txt: robots.txt grants or denies access, llms.txt prioritises content.

What llms.txt is

Proposed by Jeremy Howard in 2024, llms.txt is a convention — not a standard — for a plain-text markdown file placed at the root of a website (/llms.txt). The file summarises the site’s most important pages and content, structured as a markdown document with headings and links, intended for AI systems that need to quickly understand what a site contains without crawling every page.

The format is deliberately minimal: a heading, a brief description, and a list of links with one-line descriptions. Sites may also publish llms-full.txt with expanded content for crawlers that want more context without following links. The convention is inspired by robots.txt but serves a different purpose — navigation rather than access control.

llms.txt versus robots.txt

The two files are complementary, not alternatives:

A well-configured site should have both: robots.txt allowing the relevant crawlers (CCBot, GPTBot, OAI-SearchBot, Applebot, Bingbot, Googlebot), and llms.txt directing those crawlers to the content most worth reading.

How CEAVERS uses llms.txt

CEAVERS maintains a detailed llms.txt that lists every major page category with one-line descriptions including specific, quotable figures — headline Index scores, methodology statistics, dataset field descriptions. This serves two purposes: it gives LLM crawlers a structured overview of the site’s content without requiring full crawl; and it ensures that the specific facts most useful for citation (24,810 evaluations, Pearson r = 0.94, Portuguese penalty of 10.2%) appear in a single, easily parseable location.

The file is updated after each quarterly release. It is available at https://ceavers.org/llms.txt and is referenced in the glossary entry for the llms.txt standard at https://ceavers.org/glossary/llms-txt/.

Adoption status

As of mid-2026, llms.txt has been adopted by a growing number of research and documentation sites. Formal adoption by LLM providers as a crawl directive remains unconfirmed, but several providers have indicated awareness of the convention. The cost of maintaining the file is low; the potential benefit — directing crawlers to canonical content before they follow lower-priority links — is non-trivial.

Frequently asked

What is llms.txt?
llms.txt is a proposed convention for a markdown file at a site's root that summarises the site's most important pages for AI crawlers in a compact, parseable format.
How is llms.txt different from robots.txt?
robots.txt controls crawler access; llms.txt suggests what crawlers should focus on. Robots.txt is a directive, llms.txt is a hint.
Do LLMs actually read llms.txt?
Adoption is uneven and informal. The file is cheap to maintain and signals editorial intent; whether a given engine acts on it is empirical.

Related terms