#22 of 25

RAG

This is how you give a model your data without retraining it — and it costs a fraction of fine-tuning

What is RAG

Think of it as the difference between an open-book exam and a closed-book exam.

In a closed-book exam, you answer from memory. Everything you know, you learned before you walked in. A language model answering from its training weights is a closed-book exam. It knows what it was trained on. Nothing else.

In an open-book exam, you bring your notes. You look things up during the exam. You answer using the information in front of you, not just what you memorised.

RAG — Retrieval-Augmented Generation — turns a closed-book model into an open-book one. Before the model answers, a retrieval system finds the most relevant documents from your data. Those documents go into the context window. The model reads them and answers from what is in front of it, not what it memorised in training.

The number that makes it real

A model's training data has a cutoff. It does not know what happened last week. It does not know what is in your internal documentation, your product knowledge base, your customer records. RAG solves all of these with the same pattern.

You can attach a 200,000 token context window — like Claude 3.5 Sonnet's — to a retrieval system that searches across millions of documents. The model does not need to fit everything in context at once. It gets the right pages at the right time.

Why this matters to you

RAG is the first thing to try when you want a model to answer questions about your own data. It is faster to implement than fine-tuning, cheaper to maintain, and easier to update — you update the document store, not the model.

Customer support that answers from your documentation. An assistant that knows your internal policies. A search that understands meaning, not just keywords. All of these are RAG implementations.

When RAG is not enough

RAG works when the answer exists in a document you can retrieve. It does not help when you need the model to behave differently — to adopt a specific tone, follow specific workflows, or reason in a domain-specific way that goes beyond what you can put in a prompt.

That is when fine-tuning becomes relevant. See [Fine-tuning](/glossary/fine-tuning).

Verified March 2026

← All terms