Dify Second Brain|Deep Analysis Hybrid Retrieval and Rerank|RAG Accuracy Optimization|Rerank Model|Cross Encoder

In-depth analysis of the RAG precision optimization techniques of Dify's second brain, and master the secrets of hybrid retrieval and Rerank.
Core content:
1. The key role of RAG knowledge base in intelligent agents and its setting techniques
2. The impact of segmentation parameter settings on text processing quality and test results
3. The impact of embedding model selection on RAG effects and application scenarios of different models
A week ago, I shared Andrew Ng's article: When should LLM be fine-tuned? When should it not be fine-tuned? | Reject self-indulgent fine-tuning of technology . The best practice experience given by Professor Andrew Ng is " Before officially starting fine-tuning, please confirm whether the potential of prompt engineering, RAG knowledge base, and intelligent agent workflow has been fully explored . "
The [RAG knowledge base] has always been called the [second brain] of the intelligent agent , and has a significant impact on the output quality of the intelligent agent .
This article will explore how to optimize the accuracy of the RAG process in dify, including: the knowledge base setting skills and the AI principles behind it, why the two-stage structure is used in the retrieval stage, and what is the secret of hybrid retrieval plus rerank? The answers will be given one by one in this article.
1. Knowledge base creation and segmentation parameter setting
We start from the very beginning of creating a knowledge base, go to the Dify page, click [Create Knowledge Base], and then upload the file. It is recommended to upload the MD file, which will help RAG "understand" your notes .
Detailed explanation of key segmentation parameters:
Chunk Size is one of the most important parameters. The best length depends on the situation, but you can refer to some test results. For example, there is a test on Azure AI Search that compares the recall results of four token lengths: 512, 1024, 4096, and 8191:
The recall rate is 42.4% when the number of tokens is 512 The recall rate is 41.7% when the number of tokens is 1024 The recall rate is 40.2% when the number of tokens is 4096 The recall rate is 39.8% when the number of tokens is 8191
The test results show that the 512-token length performs best, and the recall rate differences between 1024, 4096, and 8191 are not significant;
Recall calculation method:
Assuming that there are 10 high-quality documents among the first 50 retrieved documents, and there should be 20 high-quality documents for this query, then the recall rate is 10/20=0.5 (50%).
The segment overlap length setting allows paragraphs to overlap to avoid loss of semantics due to segmentation.
Key points: Tests show that when the token length is 512 and 25% overlap is allowed, the recall rate can reach up to 43.9%.
2. Embedding model selection
In the process of text vectorization, choosing a suitable embedding model will also affect the accuracy of RAG.
There are many options on the market, which can be divided into two main categories:
High-end paid models : represented by OpenAI's text-embedding-3-large. This type of model usually has higher accuracy and stronger semantic understanding capabilities, and is suitable for projects with high accuracy requirements and sufficient budget.
Free open source models : For example, the free embedding model provided by the Dify platform, although its performance may be slightly inferior to high-end paid models, is a very practical choice for projects with limited budgets.
Because I deployed Dify locally, I was lazy and embedded the "Universal Text Vector-v3" of Tongyi Qianwen.
1. Advantages of vector search
Vector search is a standard feature of RAG, which excels at semantic understanding:
Return relevant information based on the meaning of the query, even if the exact words do not appear in the document Insensitive to spelling errors, synonyms, and wording differences Support cross-language understanding
For example, if you search for "latest iPhone flagship", vector search can return iPhone 16 Pro/Pro Max even if you don't enter a specific model.
2. Advantages of keyword search
Keyword search (full-text search) is good at exact matching:
Quickly find content containing exact keywords Suitable for specific product models, technical terms, etc. Based on TF-IDF (Term Frequency-Inverse Document Frequency) algorithm
When searching for "iPhone 16 Pro", keyword search can accurately find documents containing the phrase, which is difficult to do with vector search.
3. Best Practices for Hybrid Search
Hybrid retrieval is the use of both vector and keyword methods.
Key points: In the Azure AI Search test, the hybrid retrieval score is higher than either method alone, especially when combined with the Semantic Ranker re-ranking model.
4. Rerank improves retrieval accuracy
Rerank is a technique that scores and sorts documents based on their relevance to the query. This technique plays a key role in retrieval systems, allowing for more accurate sorting of returned results, thereby improving user experience.
2. The difference between Rerank and Embedding Model
Although Rerank looks similar to the cosine similarity calculation of the embedding model, there are obvious differences in their implementation principles:
Bi-Encoder Model
- Structural features:
The Bi-Encoder structure is adopted to generate vector representations of queries and documents respectively through two neural networks. - Implementation:
Usually the same network architecture is used, which can be understood as using the same Embedding model to generate independent vectors. - Advantages
Vector representations of documents can be pre-computed and indexed Directly use cosine similarity calculation when querying, greatly improving the speed Very suitable for fast retrieval on large-scale data sets
- limitation
Document information is compressed into a single vector The context cannot be adjusted dynamically when querying This compression may result in information loss and affect retrieval accuracy.
Cross Encoder
- How it works
Combine the query and document and feed them into the same Transformer model - Features
Generating higher quality relevance scores through deep interactions - Advantages
Ability to reanalyze document content in the context of a specific query The resulting relevance score is more precise
- limitation
Need to be recalculated every time you query Slow speed, not suitable for processing large data sets
3. Best Practices for Optimizing Search Systems
In actual retrieval systems, the advantages of Bi-Encoder and Cross Encoder are usually combined to balance speed and accuracy. The typical implementation method is to adopt a second-order retrieval design :
Four typical retrieval configurations in Dify
- Vector-only retrieval:
Sort by cosine similarity and set Top K to control the number of results returned. - Vector retrieval + Rerank:
The rerank model optimizes the results of the first stage and may move a document that was originally ranked fifth to second place. - Full-text search + Rerank:
Similar to the second method, but the first stage uses keyword search. - Hybrid Retrieval + Rerank:
The most powerful configuration, you can set the Score Threshold to filter out low-quality results.
When you need to search multiple knowledge bases, you can refer to the following methods:
Perform hybrid search on each knowledge base Aggregate all results Perform a multi-level knowledge rerank process Filter the best results by Top K and Score Threshold
Summary: By properly configuring the hybrid retrieval and rerank models in Dify, the accuracy of the RAG system can be greatly improved, thus helping us build a better intelligent agent.