What is RAG and why should we use it?

Written by
Silas Grey
Updated on:June-30th-2025
Recommendation

Get an in-depth understanding of the importance and application scenarios of Retrieval Augmentation Generation (RAG) technology in the field of AI.

Core content:
1. The definition of RAG technology and its role in enhancing LLM responses
2. The three key steps of RAG: retrieval, enhancement, and generation
3. Application examples of RAG in AI search engines and customer service chatbots

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Retrieval-augmented generation (RAG) is a popular technique that enhances the responses of LLMs by retrieving relevant external knowledge from the knowledge base before generating answers. RAG improves accuracy, reduces hallucinations, and enables the model to provide more contextual and updated information.

RAG consists of three steps: retrieval, enhancement, and generation.

Retrieval -  In this step, the system searches for relevant information in external knowledge sources (e.g., vector databases) to find relevant information based on the user query.

Enhancement -  The retrieved information is then combined with the original user query to form the LLM's prompt.

Generate -  The LLM processes the prompt and generates a response, integrating its pre-trained knowledge and retrieved information. This makes the response more accurate and contextual.

Let us understand RAG through a simple example.

1 -  User asks a query

Example: Who will be the winner of the ICC Champions Trophy 2025?

2 -  The retriever searches in a knowledge source (e.g. Wikipedia or the Internet) and returns relevant context.

Example search context: "The 2025 ICC Champions Trophy was held in Pakistan and the United Arab Emirates from February 19 to March 9, with India emerging as victorious champions, winning only the third title in the tournament's history. Hosted primarily by Pakistan - their first global cricket tournament since 1996 - the tournament implemented a hybrid format, with all of India's matches being played in Dubai due to geopolitical considerations. The final was a tense contest at the Dubai International Cricket Stadium, with India defeating New Zealand by four wickets, chasing down a target of 252 with one innings remaining."

3 -  The query, related context, and instructions are combined into a single prompt.

Example prompt:

"Answer queries only based on context. If you cannot find an answer to your query in context, reply - I cannot answer that query.

Query: Who will be the winner of the ICC Champions Trophy 2025?

*Context:* The 2025 ICC Champions Trophy was held in Pakistan and the United Arab Emirates from February 19 to March 9, with India emerging victorious, winning only the third title in the tournament's history. Hosted primarily by Pakistan - their first global cricket tournament since 1996 - the tournament implemented a hybrid format, with all of India's matches being played in Dubai due to geopolitical considerations. The final was a tense contest at the Dubai International Cricket Stadium, with India defeating New Zealand by four wickets, chasing down a target of 252 with one innings remaining.

4 -  The prompt is fed into a Large Language Model (LLM) which generates an answer to the user query based on the provided context.

Sample output: "India won the 2025 ICC Champions Trophy by defeating New Zealand by four wickets in the final held at the Dubai International Cricket Stadium."

RAG Application

AI Search Engine

AI search engines use RAG to enhance search results by combining large language models with real-time data retrieval, providing accurate and contextual answers. They excel at understanding natural language queries and extracting information from massive data sets, making search more intuitive and efficient.

Customer Service Chatbot

Customer service chatbots leverage RAG to provide personalized and accurate responses by retrieving company-specific data (such as FAQs or product manuals) and generating human-like responses. This reduces response time, increases customer satisfaction, and handles complex queries that go beyond simple scripted answers.

Legal document analysis

Legal Document Analysis uses RAG to sift through large volumes of legal texts, contracts, or case law, retrieve relevant clauses or precedents, and summarize them in plain language. It helps lawyers by accelerating research, ensuring accuracy, and identifying key insights from dense documents.

Scientific research assistance

Scientific Research Assist uses RAG to provide researchers with concise summaries or hypotheses by retrieving and synthesizing information from scientific papers, datasets, or experiments. It simplifies the process of literature review, fact checking, and exploring complex topics across large research repositories.

Medical decision support

Medical decision support integrates RAG into patient data, medical literature, or treatment guidelines to assist physicians in making evidence-based recommendations or diagnoses. It enhances the decision-making process by providing up-to-date, context-specific insights while prioritizing patient privacy and accuracy.

Personalized Education

Personalized Education App RAG customizes the learning experience, retrieves relevant educational resources and generates explanations that are suitable for students' learning progress and understanding level. It supports tutors or self-learners by adapting to individual needs and effectively filling knowledge gaps.

Technical Document Search

Technical Documentation Search uses RAG to navigate complex manuals, code bases, or troubleshooting guides, retrieve precise solutions, and explain them clearly. It saves time for developers and engineers by quickly resolving technical queries and providing detailed, context-aware responses.


Why do we need RAG?

Large Language Models (LLMs) are typically trained on huge datasets, which include text from books, Wikipedia, websites, and code from GitHub repositories. This training data is collected up to a specific date, which means that the knowledge of the LLM has a cutoff point related to when the training data was last updated. For example, if the LLM is trained only until December 2023, it will know nothing about what happened after that.

Without RAG, when users ask about events, developments, or information beyond that deadline, the LLM faces a problem: it either fails to provide an answer (leaving the query unresolved), or worse, it may “hallucinate,” generating responses that sound plausible but are incorrect. This is because LLMs are designed to predict and generate text based on patterns in their training data, rather than being able to inherently distinguish between what they know and what they don’t know.

With RAG, this limitation is addressed through an integrated retrieval mechanism. When a query is posed—especially queries related to recent events—the retriever fetches relevant and up-to-date context in real time from external sources such as web data, databases, or posts on platforms like X.

The retrieved information is then provided as additional context to the LLM, enabling it to generate accurate and informed responses based on the latest available data, rather than relying solely on its static, pre-trained knowledge. In essence, RAG bridges the gap between the LLM’s fixed training cutoffs and the ever-changing world, ensuring more reliable and current answers.

To summarize:

LLMs are trained on large collections of books, Wikipedia, website texts, and GitHub repositories. However, their training data is limited to information available before a certain date. This means that their knowledge is truncated at that date .

No RAG issues

  • LLMs  are unable to respond  to inquiries regarding events or facts occurring after their training deadline.
  • They may generate incorrect or phantom responses , making them unreliable in providing up-to-date information.

Solutions with RAG

  • Search
     Relevant content from external knowledge sources such as databases, APIs, or private documents.
  • supply
     The retrieved relevant content is provided as context along with the query to the LLM, enabling it to generate factually accurate answers.
  • Ensure responses are based on retrieved information to reduce illusions.

Therefore, RAG enhances LLMs by keeping them updated without frequent retraining.