Understanding RAG Part 1: Why do we need it?

Written by

Jasper Cole

Updated on:June-27th-2025

Natural language processing (NLP) is a field of artificial intelligence (AI) that aims to teach computers to understand human written and spoken language and to interact with humans using that language. While traditional NLP methods have been studied for decades, the emergence of large language models (LLMs) in recent years has dominated almost all developments in this field. LLMs have revolutionized NLP and the entire field of artificial intelligence by combining complex deep learning architectures with self-attention mechanisms that can analyze complex patterns and interdependencies in language. LLMs are capable of handling a wide range of language generation and language understanding tasks and have a wide range of applications, such as conversational chatbots, deep document analysis, translation, and more.

Capabilities and limitations of LLM

Large general language models (LLMs) launched by major AI companies, such as OpenAI's ChatGPT model, focus mainly on language generation . That is, given a prompt - a query, question, or request from a user in human language - the LLM must generate a natural language response to that prompt word for word. In order to accomplish this seemingly daunting task, LLMs need to be trained on extremely large datasets that contain millions to billions of text documents covering any topic you can think of. In this way, LLMs are able to fully learn the nuances of human language, imitate the way we communicate, and use what they have learned to generate their own "human-like language", thereby achieving unprecedented fluent human-computer communication.

There is no doubt that large language models (LLMs) have taken a big step forward in the development and vision of artificial intelligence, but they are not without limitations. Specifically, if a user asks a large language model (LLM) for a precise answer in a specific context (for example, the latest news), the model itself may not be able to provide a specific and accurate answer. The reason is: Large language models (LLMs) are limited by the data they are exposed to, especially during the training phase. Unless they are trained frequently (an extremely costly process, let’s be honest), large language models (LLMs) are generally unable to perceive the latest news.

Worse, when LLMs lack the underlying information to provide precise, relevant, or truthful answers, they are likely to generate answers that appear convincing, even if that means the answers are built entirely on made-up information. This problem that often occurs in LLMs is called “hallucination” : generating inaccurate and unfounded text that misleads users.

The Birth of RAG

Even the largest language models (LLMs) on the market have suffered from data staleness, expensive retraining, and hallucination problems to some extent. Tech giants are also well aware of the risks and impacts these models bring when they are used by millions of users around the world. For example, the incidence of hallucination in the early ChatGPT model was estimated to be around 15%, which had a profound impact on the reputation of organizations using these models and undermined the reliability and trust of the entire AI system.

This is where RAG (Retrieval Augmented Generation) comes in. RAG is undoubtedly one of the major breakthroughs in the field of natural language processing after the emergence of LLM, because it effectively solves the limitations of LLM mentioned above. The core concept of RAG is to combine the accuracy and search capabilities of information retrieval technology commonly used by search engines with the deep language understanding and generation capabilities of LLM.

Broadly speaking, RAG systems enhance LLM by incorporating up-to-date and realistic contextual information into the user query or prompt, which is obtained during the retrieval phase prior to the LLM-led language understanding and subsequent response generation process.

RAG can solve the above problems commonly encountered in LLM, as follows:

Data outdated:
RAG can help overcome data obsolescence by retrieving and integrating the latest information from external sources so that responses reflect the latest available knowledge
Retraining costs:
By dynamically retrieving relevant information, RAG reduces the need for frequent and expensive retraining, allowing LLMs to stay up to date without requiring complete retraining.
Hallucinations:
RAG helps mitigate illusions by basing responses on factual information retrieved from real documents, minimizing the generation of false or fabricated responses that lack authenticity