Quick introduction to RAG related terms

Written by
Iris Vance
Updated on:June-18th-2025
Recommendation

Quickly master the core terminology of RAG technology and gain an in-depth understanding of the architecture and application of retrieval enhancement generation.

Core content:
1. The core components of the RAG architecture and their meanings
2. Detailed explanation of embedding and vector retrieval technology
3. Key terms of generation and context control
4. Introduction to related technologies and patterns

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

RAG ( Retrieval-Augmented Generation)

1. Core components of RAG architecture


the term
meaning
Retriever
Responsible for finding documents or snippets related to user questions from external knowledge bases (such as Top-k retrieval in vector databases).
Generator
Usually a large language model (such as GPT, T5) uses the retrieved information to generate the final answer.
Index
The core data structure of the retrieval system, used to quickly find documents. Usually a vector index.
Knowledge Base / Corpus
A collection of content that stores structured or unstructured knowledge from which the RAG system retrieves relevant information.


2. Embedding and vector retrieval


the term
meaning
Embedding
Convert text into vectors for semantic comparison and retrieval.
Dense Retrieval
Using semantic vectors (such as DPR and BERT) for text retrieval is better than the traditional TF-IDF method.
Vector Store
Databases for storing document vectors, such as FAISS, Pinecone, Milvus, Weaviate, etc.
ANN (Approximate Nearest Neighbor)
An algorithm for efficiently finding similar vectors, often used in large-scale vector retrieval.


3. Search Technology


the term
meaning
DPR (Dense Passage Retrieval)
The dense retrieval method proposed by Facebook trained the Query Encoder and Passage Encoder.
BM25
A classic sparse text retrieval algorithm based on word frequency, commonly used in traditional search engines.
Hybrid Retrieval
At the same time, the results of sparse retrieval (such as BM25) and dense retrieval (such as DPR) are combined to improve the recall rate.


4. Generation and Context Control


the term
meaning
Context Window
The maximum input length that LLM can handle. If it exceeds the length, it will be truncated.
Chunking
Split long documents into smaller chunks to fit within the retrieval and context window constraints.
Top-k Retrieval
Returns the top k document chunks or fragments that are most relevant to the query.
Prompt Engineering
Design prompt words to better guide the language model to generate answers using the retrieved content.
Grounding
Ensure that the generated content is based on real search results, not hallucinations.


V. Related technologies and models


the term
meaning
Reranking
The preliminary search results are scored and sorted again to improve quality.
Query Expansion
Enhance search results by adding synonyms, hyponyms, and similar words.
Multi-hop Retrieval
Supports complex question answering across multiple documents or query steps.
Fusion-in-Decoder (FiD)
A generative architecture proposed by Google that fuses multiple retrieved documents into the decoder.
Retriever-Reader Architecture
The traditional question-answering architecture of retriever + reader is the predecessor of RAG.


6. RAG deployment related


the term
meaning
Cold Start
The problem of lack of valid retrieval results or embedded representations when the system is first run.
Latency
The total time consumption of retrieval + generation is one of the key points of RAG system optimization.
Caching
Cache common search or generation results to improve performance.
Incremental Indexing
Supports a mechanism to add new documents without rebuilding the entire index.