6 LLMs  ·  5 languages  ·  Quarterly index  ·  Independent research  ·  Updated Q2 2026
CEAVERS
Centre for European AI Visibility Evaluation & Research Standards

Glossary

Retrieval-Augmented Generation

Last reviewed: 2026-05-22

Retrieval-Augmented Generation (RAG) is an architecture in which the language model retrieves documents from an external index at inference time, then conditions its response on those documents. Most production AI search products — including Perplexity, ChatGPT Search, Microsoft Copilot, and Google AI Overviews — are RAG-based.

How RAG works

A RAG system operates in two sequential stages: retrieval and generation. At the retrieval stage, the user’s query is encoded as a vector and compared against a pre-indexed document store using approximate nearest-neighbour search (commonly FAISS, BM25, or a hybrid of both). The top-ranked document chunks are passed into the model’s context window. At the generation stage, the model produces its response conditioned on those retrieved documents — grounding its answer in real text rather than relying solely on parametric memory.

The boundary between retrieval and generation is the primary leverage point for visibility. A brand not retrieved cannot be cited. Retrieval is governed by semantic relevance to the query — page structure, topical focus, and indexing status matter more than overall domain authority.

What makes a page retrievable

Retrieval probability is higher for pages that address a single, well-defined topic (multi-topic pages dilute relevance signals), use clear heading structure that enables chunked indexing, contain explicit and quotable claims with specific figures, carry schema.org markup (particularly ScholarlyArticle, Dataset, or FAQPage), and are indexed by the engine’s underlying crawl. ChatGPT Search and Perplexity primarily use Bing; Gemini uses Google; open-source RAG deployments typically use Common Crawl.

RAG and brand visibility

A brand’s RAG visibility is determined by whether its owned content, Wikipedia article, or press coverage is retrieved when a query is semantically close to its products, category, or sector. The CEAVERS measurement methodology issues 827 prompt templates per LLM-language cell and measures whether the brand appears in the generated response — capturing the end-to-end RAG output.

Improving RAG visibility requires improving both retrievability (indexing, structure, freshness) and citation probability once retrieved (specificity, cross-corroboration, structured metadata). Both stages must be addressed.

Frequently asked

What is Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) is an architecture where the language model first retrieves relevant documents from an external index and then generates a response grounded in those documents.
Why does RAG matter for AEO?
RAG is the dominant architecture behind AI search products. Whether a brand appears in a RAG-generated response depends on whether its content was retrieved at query time.
What signals make a page more retrievable?
Clear topical focus, primary-source citations, freshness, schema.org markup, and inclusion in the engine's underlying index (typically Bing, Google, Brave, or Common Crawl).

Related terms