Improving RAG: Using hybrid search and re-ranking to optimize retrieval results (with code examples)

Written by

Jasper Cole

Updated on:June-27th-2025

RAG combines information retrieval and language generation capabilities to enable the model to refer to relevant external knowledge when generating text, thereby generating more accurate and targeted content. However, in practical applications, RAG (Enterprise RAG Accuracy Improvement Full Process Guide: From Data Extraction to Accurate Retrieval) faces many challenges, among which accurate retrieval of relevant information is one of the key issues. This article will focus on how to improve the retrieval effect of RAG through hybrid search and re-ranking technology.

1. Challenges of RAG Search

In the actual application of RAG, there are many difficulties in the retrieval process. Among them, accurate matching of specific identifiers (IDs) is a typical problem. In many application scenarios, data is usually assigned a unique ID, which may contain complex combinations of numbers, letters, etc., and the length is often more than 10 characters. When searching, if you only rely on vector-based semantic search, such as using PGVector to store embedded vectors for similarity retrieval, inaccurate retrieval results will occur.

Semantic search sorts the search results based on the soft similarity between vectors, that is, by calculating the similarity between vectors (such as cosine similarity) to determine the relevance of the document to the query, and returning the top k results with the highest similarity. This method performs well when processing texts with similar semantics, but it is not capable of tasks that require strict consistency, such as exact ID matching. For example, when querying "1234567890XYZ", semantic search may return a text block containing "1234567890ABC" because from the perspective of semantic similarity, the two IDs have a certain similarity, while the real matching "1234567890XYZ" may be ranked lower or even excluded from the search results due to other factors (such as subtle differences in vector representation). This problem seriously affects the accuracy of the RAG system in scenarios involving exact ID matching, such as medical record query (retrieval of specific medical records based on patient ID), financial transaction information query (obtaining transaction details based on transaction ID), etc., where incorrect search results may lead to serious consequences.

2. Hybrid Search Solution

In order to solve the above problems, hybrid search methods came into being. This method combines the advantages of multiple search technologies to improve the accuracy and comprehensiveness of retrieval. Specifically, it includes the following steps:

1. Semantic search based on PGVector

PGVector is an extension for storing and querying vector data in PostgreSQL databases. It allows text or other data to be converted into vector form and to find related documents through vector similarity search. In the first step of hybrid search, semantic search using PGVector can quickly find text blocks that are semantically related to the query. Documents and queries are converted into vector representations through pre-trained embedding functions (such as using JinaAI's "jina-embeddings-v2-base-en" model), and then efficient similarity retrieval is performed in the database. For example, in a document library containing a large number of news articles, when querying "latest developments in the technology industry", PGVector semantic search can quickly return article fragments related to the technology industry, which are semantically highly relevant to the query. Although this method works well when processing semantically similar text, as mentioned above, it is insufficient for exact ID matching, so it needs to be combined with other search methods.

PostgreSQL full-text search

PostgreSQL provides powerful full-text search capabilities, including exact matching and partial matching. In hybrid search, this step is used to make up for the shortcomings of semantic search in exact matching.

Exact Match
By usingto_tsvectorandplainto_tsqueryfunction to ensure that the queried ID appears exactly in the text block. For example, when searching for "1234567890XYZ",to_tsvector('english', content) @@ plainto_tsquery('english', '1234567890XYZ')This query willdocumentsTablecontentThe field searches for records that fully match "1234567890XYZ". This exact match method can accurately find the text block containing the target ID, avoiding possible misjudgments in semantic search.
Partial Match
If exact match does not find any results, partial match comes into play.ILIKEoperator for partial matching, such ascontent ILIKE '%1234567890XYZ%', which finds text blocks that contain part of "1234567890XYZ". This is useful for handling variations or misspellings of IDs. For example, when a user enters a slightly incorrect ID, partial matching can expand the search to find as many relevant text blocks as possible.

(III) Merger results

After completing the semantic search and full-text search, the results of the two searches need to be merged. Since the two search methods may return some overlapping text blocks, deduplication is required. The process of merging results can be implemented through programming. For example, in Python, data structures such as lists or sets can be used to store and process search results, and duplicates can be removed by comparing the unique identifier of the text block (such as the document ID) or the content itself. The merged result set contains text blocks that are relevant to the query from the perspectives of semantics and text matching, providing a more comprehensive basis for subsequent re-ranking.

4. Reranking with Flash Re-Ranker

Flash Re-Ranker is a tool for improving the relevance of search results. Although the results obtained through hybrid search contain relevant text blocks, the order of these text blocks may not be optimal, that is, the relevance to the query is not arranged in descending order. Flash Re-Ranker reorders the merged results to put the most relevant text blocks at the front. It uses more complex algorithms and models to comprehensively consider multiple factors such as the semantics of the text, the degree of vocabulary matching, and the context to evaluate the relevance of the text block to the query. For example, in a document library containing a large number of product descriptions, when a user queries for a specific product ID, Flash Re-Ranker can reorder the results obtained from the hybrid search based on factors such as the matching of the product ID and the semantic relevance of the product description to the query, so that the product description that best meets the user's needs is ranked at the top of the search results.

3. Implementation based on LangChain

LangChain is a framework specifically designed for developing language model-based applications. It facilitates the integration of various retrieval and text processing technologies. The following are the specific steps to use LangChain to implement the above hybrid search and re-ranking:

1. Implementation of semantic search based on PGVector

from langchain.vectorstores import PGVectorfrom jinaai import Embedding# Initialize PGVectorvector_store = PGVector( connection_string="postgresql://myuser:mypassword@localhost/mydb", embedding_function=Embedding(model="jina-embeddings-v2-base-en"))# Perform semantic search query = "1234567890XYZ"semantic_results = vector_store.similarity_search(query, k=10)print(semantic_results)

In this code, first passPGVectorThe class initializes a vector storage object, specifies the database connection string and the embedded function. Then, usesimilarity_searchMethod to perform semantic search, passing in the query statement and the number of results to be returnedk.

(II) PostgreSQL full-text search implementation

import psycopg2 # Define custom stop words STOP_WORDS = { "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with", "me", "my", "you", "your", "we", "our", "us", "he", "him", "his", "she", "her", "hers", "it", "its", "them", "so", "too" } def filter_stop_words(query): """Remove stop words from the input query.""" words = query.split() # Split the query into words filtered_words = [word for word in words if word.lower() not in STOP_WORDS] return " ".join(filtered_words) if filtered_words else query # Make sure query is not empty def perform_full_text_search(query): conn = psycopg2.connect("dbname=mydb user=myuser password=mypassword host=localhost") cursor = conn.cursor() # Filter stop words from query filtered_query = filter_stop_words(query) print(f"Original query: {query}") print(f"Filtered query (without stop words): {filtered_query}") # Full text search query (now without stop words) cursor.execute( "SELECT * FROM documents WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s);", (filtered_query,) ) exact_results = cursor.fetchall() # Fallback: partial match using ILIKE (ignore stop words)    cursor.execute("SELECT * FROM documents WHERE content ILIKE %s;", (f"%{query}%",)) partial_results = cursor.fetchall() # Close the cursor and connection cursor.close() conn.close() return exact_results + partial_resultsquery = "I want to see details of 1234567890XYZ"retrieved_chunks = perform_full_text_search(query)print("result:", retrieved_chunks)

This code first defines a set of stop wordsSTOP_WORDS, and wrote afilter_stop_wordsThe function is used to remove stop words in the query.perform_full_text_searchThe function is responsible for performing PostgreSQL's full-text search, including exact and partial matches, and returns the search results.

(III) Merge and reorder results

from flashrank import FlashRankerdef re_rank_results(semantic_results, text_results): ranker = FlashRanker() # Extract text from semantic results semantic_texts = [{"text": doc.page_content} for doc in semantic_results] # Extract text from text result tuple text_texts = [{"text": row[1]} for row in text_results] # Assume text is in the second column combined_results = semantic_texts + text_texts reranked_results = ranker.rank(combined_results) return reranked_resultsfinal_results = re_rank_results(semantic_results, retrieved_chunks)print(final_results)

In this part of the code,re_rank_resultsThe function uses Flash Re-Ranker to merge and rerank the results of semantic search and full-text search. First, the text content is extracted from the two search results, then they are merged into a list, and finally the result is sorted usingranker.rankThe method reorders and returns the final result.

IV. Important Considerations

1. Custom stop words

In full-text search, the processing of stop words is crucial. Stop words refer to words that appear frequently in the text but have little meaning for the search, such as "a", "the", "and", etc. By customizing the stop word list, you can flexibly add or exclude specific words according to the needs of specific application scenarios. When processing scientific and technological literature retrieval, some common words in a specific field may be stop words in other scenarios, but they are of great significance in this field. In this case, these words need to be excluded from the stop word list; conversely, when processing text in a specific format (such as code comments), some special symbols or keywords may need to be added to the stop word list to improve the accuracy and efficiency of the search.

PostgreSQL configuration

PostgreSQL itself provides built-in stop word lists in multiple languages, and supports the creation of custom dictionaries for more advanced stop word processing. Reasonable configuration of PostgreSQL's stop word settings can further optimize the performance of full-text search. For example, according to the language and text characteristics involved in the application, choose a suitable built-in stop word list; for some complex language processing requirements, create a custom dictionary, mark specific words or phrases as stop words or perform special word form conversion to improve the accuracy and efficiency of the search.

(III) Performance optimization

Stop word removal can not only improve search accuracy, but also significantly improve search performance, especially when dealing with large-scale data sets. After removing stop words, the vocabulary that needs to be processed during search is reduced, and the query burden on the database is reduced, thereby speeding up the search. In addition, during the index construction phase, removing stop words can also reduce the size of the index and improve the efficiency of index construction and storage. In practical applications, it is necessary to weigh the relationship between the complexity of stop word processing and performance improvement based on the size and characteristics of the data set, and select the most appropriate stop word processing strategy.

In the application of RAG technology (Combining DeepSeek, FAISS and LangChain to build a RAG system", accurate retrieval of relevant information is the key to improving system performance. By combining the hybrid search method based on PGVector semantic search, PostgreSQL full-text search and Flash Re-Ranker re-ranking, the inaccurate problem when retrieving specific IDs can be effectively solved, and the retrieval accuracy of the RAG system can be improved.