Mastering Agent-Based RAGs: 5 Steps to Implementing a Self-Assessment Mechanism to Enhance Information Retrieval

Master Agentic RAG and improve your information retrieval ability!
Core content:
1. Understand the development background of LLM and RAG systems
2. How Agentic RAG enhances information retrieval through self-assessment
3. Detailed steps to implement Agentic RAG using GPT-4
End-to-end tutorial for developing Agentic RAG
As LLMs have evolved, models have become able to understand large amounts of data and perform logical reasoning. One of the most important advances that came with these developments is the Retrieval Augmentation Generation (RAG) system.
LLMs have been trained on very large datasets, but they are limited by the training data.
Let's say you have a company. You have some policy documents. In order for your employees to find the right answer, they either need to know these documents very well or they need to search for the answer in the documents. You want to make this system easier with a chatbot. As LLM develops, you can use it, but LLM does not know your data. When you ask it any question about these topics, it will most likely hallucinate. You need to teach your data with fine-tuning, which is a long and expensive process. Or you can use a RAG system. Thanks to the RAG system, LLM can search in your documents and give its answers based on these documents.
Sounds great, right? Of course, it’s not that simple. Traditional RAG systems retrieve relevant documents or pieces of information based on the user’s query and provide answers by forwarding this information to the LLM. However, this approach may not work for complex information needs or ambiguous queries. This is where the concept of *“Agentic RAG”* comes into play.
Agentic RAG is an advanced system that adds autonomous decision planning and self-evaluation capabilities to traditional RAG systems, making the information retrieval process more intelligent and flexible. These systems can reformulate the user's query, evaluate different information sources, and measure the accuracy and quality of their own answers.
In this article, we will take the example of the Agentic RAG system, which can provide comprehensive and accurate answers to user questions based on a specific article. In our example, we use the GPT-4 model from OpenAI, which will be able to understand the given article, perform web searches if necessary, rephrase the query, and evaluate its own answers. I prepared this example so that everyone can understand some information and methods of Agentic RAG. It can be improved or added with different methods.
RAG System Basics
Retrieval-augmented generation (RAG) is an approach that allows LLMs to access information sources other than the training data. A traditional RAG system consists of three basic components:
• Retriever: A component that retrieves relevant documents or pieces of information from a database based on a user query. • Generator: Usually a LLM. It generates answers using the retrieved documents and the user query. • Indexer: Preprocesses documents and stores them in a vector database for efficient access.
Limitations of Traditional RAG Systems
As we mentioned at the outset, there are many advantages and disadvantages to the RAG system.
1. They use user queries as is. They cannot be improved upon, making it difficult to get answers. 2. They may not be applicable to complex problems. 3. They have no control over the quality or accuracy of the answers they produce.
Basic knowledge of Agent
An agent is a system that can make decisions on its own, execute those decisions, and evaluate their correctness.
• They can make autonomous decisions • They can be planned • They can use different tools for different tasks and they can choose which tools to use • They solve problems step by step and can change strategies when necessary
Basics of Agentic RAG
Agentic RAG uses Agent to eliminate the limitations of traditional RAG systems.
• More accurate answers: Reduced risk of hallucinations through multiple verifications and self-assessments. • Comprehensive information access: By combining different sources and strategies, more comprehensive information can be accessed. • Adaptability: Adjust the method based on the quality or complexity of the user's query. • Transparency: Thought processes and sources of information can be clearly explained.
Practical Projects
In the previous chapters, we discussed the basics of Agentic RAG. Now it's time to put them into practice.
System Architecture
• LLM engine: language model based on GPT-4. • Document Access Chain: RetrievalQA chain, retrieves relevant content from the article database. • Web Search Tool: A component that uses the Tavily API to perform web searches. • Query Reconstruction Module: Functions to make fuzzy queries more specific. • Self-evaluation mechanism: A module that evaluates the quality of the generated answers. • Agent Orchestrator: Orchestrate the LangChain Agent used by different tools. • Memory system: ConversationBufferMemory that stores conversation history.
Data preparation and indexing
First, let's import all the libraries we'll be using.
import fitz
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.tools import Tool
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool, AgentType
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain.memory import ConversationBufferMemory
import os
os.environ[ "TAVILY_API_KEY" ] = "your-api-key"
The foundation of our Agentic RAG example is an accurate and efficient data preparation and indexing process. This process includes processing, segmenting, and converting our PDF articles into a vector database.
First, we will use the PyMuPDF library to extract content from documents in PDF format.
def extract_text_from_pdf ( pdf_path ):
doc = fitz.open (pdf_path)
text = "\n" .join([page.get_text() for page in doc])
return text
text = extract_text_from_pdf( "pdf_files/article_2.pdf" )
This function extracts text from all pages of a given PDF file and merges them.
The text in a PDF can be very long. Processing such long data in its entirety is also inefficient. Therefore, we use the RecursiveCharacterTextSplitter class of LangChain to split the text into smaller, more manageable parts:
text_splitter = RecursiveCharacterTextSplitter(chunk_size= 500 , chunk_overlap= 50 )
docs = text_splitter.create_documents([text])
here:
• chunk_size=500
: It ensures that each text contains a maximum of 500 characters.• chunk_overlap=50
: It prevents loss of meaning due to splitting sentences or paragraphs by ensuring that there is 50 characters of overlap between consecutive parts.
We also use the OpenAIEmbeddings() class to convert text into embeddings.
embeddings = OpenAIEmbeddings()
After all these operations are completed, we create the FAISS vector database by converting each text snippet into a vector representation to efficiently query the snippet text:
vectorstore = FAISS.from_documents(docs, embeddings)
vectorstore.save_local( "faiss_index" )
FAISS (Facebook AI Similarity Search) is a library designed to perform efficient similarity search on high-dimensional vectors.
When the application runs, we load the FAISS index created earlier as follows:
vectorstore = FAISS.load_local( "faiss_index" , embeddings, allow_dangerous_deserialization= True )
retriever = vectorstore.as_retriever()
By converting the loaded vector database into a retriever object, we can efficiently find text snippets relevant to the user query.
This data preparation and indexing process forms the information retrieval basis of our Agentic RAG system.
Agentic RAG
First, we initialize the selected LLM model. In our case, it is gpt-4.
llm = ChatOpenAI(model="gpt-4")
Next, we need to create the RetrievalQA chain so that we can retrieve information from the article database.
retrieval_qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents= True
)
The link receives user queries, extracts relevant documents, and generates responses through LLM.
We also use TavilySearchResults to search for information not found in the article. This tool integrates with the Tavily API and returns the two most relevant results for each query. To do this, you must first visit the Tavily website, create an account, and obtain an API key. Otherwise, it will report an error.
search = TavilySearchResults(max_results=2)
Next, we have a custom function that we will add. This function will rewrite vague or general input to be more specific.
def query_reformulation ( query ):
response = llm.predict( "Rewrite this query to be more specific: " + query)
return response
This function sends a request to the LLM to refine the query. For example, it can transform a general query such as "What is artificial intelligence?" into a more specific query such as "What are the main features and applications of the AI technology described in the article?".
We have another function. This function will allow the system to evaluate the answer it generates.
def self_evaluate ( input_text ):
parts = input_text.split( "|||" )
query = parts[ 0 ]
response = parts[ 1 ]
sources = parts[ 2 ] if len (parts) > 2 else ""
evaluation_prompt = f"""
Evaluate the following response to the query:
QUERY: {query}
RESPONSE: {response}
SOURCES: {sources}
Assessment based on:
1. Factual accuracy (Does it match the sources?)
2. Completeness (Does it address all aspects of the query?)
3. Relevance (Is the information relevant to the query?)
4. Hallucination (Does it contain information not supported by sources?)
Return a confidence score from 0-10 and explanation.
"""
evaluation = llm.predict(evaluation_prompt)
return evaluation
This function evaluates answers for accuracy, comprehensiveness, relevance, and illusion. The evaluation includes a 0–10 confidence score and an explanation.
Now that we have all of our functions defined, we can gather our tools together.
tools = [
Tool(
name = "Article Retrieval" ,
func= lambda q: retrieval_qa_chain({ "query" : q})[ "result" ],
description= "Retrieve knowledge from the article database."
),
Tool(
name = "Web search" ,
func=search,
description= "If the requested information cannot be found in the documents, it specifies this and performs a web search."
),
Tool(
name = "Query reformulation" ,
func=query_reformulation,
description= "Reformulate a query to be more specific and targeted."
)
]
Our tools include article retrieval, web search, and query reformulation. The agent will decide which tool to use in which situation.
We also added ConversationBufferMemory from the LangChanin library to persist the chat history with the user.
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
Now that we have created all the components, it is time to start the agent.
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose= True ,
memory=memory
)
Here Agent type STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION
Create an agent that can reason step by step and use tools as appropriate.
Now it's time to run our agent. To do this, we write a function that can be used for self-evaluation.
def get_evaluated_response ( query ):
response = agent.run(query)
try :
result = retrieval_qa_chain({ "query" : query})
sources = [doc.page_content for doc in result.get( "source_documents" , [])]
sources_text = "\n" .join(sources)
except Exception as e:
sources_text = "No sources available"
evaluation = self_evaluate( f" {query} ||| {response} ||| {sources_text} " )
return {
"query" : query,
"response" : response,
"evaluation" : evaluation,
"sources" : sources_text
}
This function gets the response from the proxy and evaluates it and returns the response from there.
We can see the response of the self-assessment and the response of the agent using the function below.
def transparent_response ( query ):
result = get_evaluated_response(query)
return f"""
Response: {result[ 'response' ]}
Confidence assessment: {result[ 'evaluation' ]}
"""
Let's try it now.
The article I provided is a survey on multi-agent systems.
print(transparent_response("What is multi agent system?"))
When we run our system like above, we get the following output. I can't show all the output here because they would be too long, but the agent understood that it needed to answer our question from the article and used the "Article Retrieval" tool for this. He then created a final answer for us. Self-Assessment evaluated the result created and gave it a score of 9.75 out of 10.
Let's try a different example that is not in the article.
print(transparent_response("How is the current weather in Istanbul?"))
As you can see, this time the agent understood that it needed to use the network tool and brought us the answer from there. However, our confidence score was low because there were no shared resources. This is a part that needs to be optimized.
Now let's ask something that LLM already knows.
print(transparent_response("When was YouTube founded?"))
The system tells us it already has this information and gives us the answer directly, without any tools involved. Our confidence score is very high.
Our last example is to try out query reformulation. For this, I have entered a very ambiguous input.
print(transparent_response("bake cake"))
The agent tries to make it more efficient by rewriting the input and then using a web search tool.