Woter AI detection.Hurry - ends Jun 29th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Dify Second Brain｜Deep Analysis Hybrid Retrieval and Rerank｜RAG Accuracy Optimization｜Rerank Model｜Cross Encoder

Written by

Caleb Hayes

Updated on:June-25th-2025

A week ago, I shared Andrew Ng's article: When should LLM be fine-tuned? When should it not be fine-tuned? | Reject self-indulgent fine-tuning of technology . The best practice experience given by Professor Andrew Ng is " Before officially starting fine-tuning, please confirm whether the potential of prompt engineering, RAG knowledge base, and intelligent agent workflow has been fully explored . "

The [RAG knowledge base] has always been called the [second brain] of the intelligent agent , and has a significant impact on the output quality of the intelligent agent .

This article will explore how to optimize the accuracy of the RAG process in dify, including: the knowledge base setting skills and the AI principles behind it, why the two-stage structure is used in the retrieval stage, and what is the secret of hybrid retrieval plus rerank? The answers will be given one by one in this article.

1. Knowledge base creation and segmentation parameter setting

We start from the very beginning of creating a knowledge base, go to the Dify page, click [Create Knowledge Base], and then upload the file. It is recommended to upload the MD file, which will help RAG "understand" your notes .

In the Dify platform, segmentation settings are one of the key factors affecting the quality of text processing. Let’s start with [Segmentation settings] and discuss them in detail.

Detailed explanation of key segmentation parameters:

Chunk Size is one of the most important parameters. The best length depends on the situation, but you can refer to some test results. For example, there is a test on Azure AI Search that compares the recall results of four token lengths: 512, 1024, 4096, and 8191:

The recall rate is 42.4% when the number of tokens is 512
The recall rate is 41.7% when the number of tokens is 1024
The recall rate is 40.2% when the number of tokens is 4096
The recall rate is 39.8% when the number of tokens is 8191

The test results show that the 512-token length performs best, and the recall rate differences between 1024, 4096, and 8191 are not significant;

Recall calculation method:
Assuming that there are 10 high-quality documents among the first 50 retrieved documents, and there should be 20 high-quality documents for this query, then the recall rate is 10/20=0.5 (50%).

The segment overlap length setting allows paragraphs to overlap to avoid loss of semantics due to segmentation.

Key points: Tests show that when the token length is 512 and 25% overlap is allowed, the recall rate can reach up to 43.9%.

By deeply understanding these parameters and test data, you can configure Dify's segmentation settings more scientifically to achieve better text processing results.

2. Embedding model selection

In the process of text vectorization, choosing a suitable embedding model will also affect the accuracy of RAG.

There are many options on the market, which can be divided into two main categories:

High-end paid models : represented by OpenAI's text-embedding-3-large. This type of model usually has higher accuracy and stronger semantic understanding capabilities, and is suitable for projects with high accuracy requirements and sufficient budget.
Free open source models : For example, the free embedding model provided by the Dify platform, although its performance may be slightly inferior to high-end paid models, is a very practical choice for projects with limited budgets.

Because I deployed Dify locally, I was lazy and embedded the "Universal Text Vector-v3" of Tongyi Qianwen.

Dify's model supplier section provides a large number of models for introduction and use, but you need to apply for the API-KEY for access from the corresponding large model platform, as shown in the figure below:

3. In-depth analysis of search methods

1. Advantages of vector search

Vector search is a standard feature of RAG, which excels at semantic understanding:

Return relevant information based on the meaning of the query, even if the exact words do not appear in the document
Insensitive to spelling errors, synonyms, and wording differences
Support cross-language understanding

For example, if you search for "latest iPhone flagship", vector search can return iPhone 16 Pro/Pro Max even if you don't enter a specific model.

2. Advantages of keyword search

Keyword search (full-text search) is good at exact matching:

Quickly find content containing exact keywords
Suitable for specific product models, technical terms, etc.
Based on TF-IDF (Term Frequency-Inverse Document Frequency) algorithm

When searching for "iPhone 16 Pro", keyword search can accurately find documents containing the phrase, which is difficult to do with vector search.

3. Best Practices for Hybrid Search

Hybrid retrieval is the use of both vector and keyword methods.

Key points: In the Azure AI Search test, the hybrid retrieval score is higher than either method alone, especially when combined with the Semantic Ranker re-ranking model.

4. Rerank improves retrieval accuracy

1. Introduction to Rerank Technology

Rerank is a technique that scores and sorts documents based on their relevance to the query. This technique plays a key role in retrieval systems, allowing for more accurate sorting of returned results, thereby improving user experience.

2. The difference between Rerank and Embedding Model

Although Rerank looks similar to the cosine similarity calculation of the embedding model, there are obvious differences in their implementation principles:

Bi-Encoder Model

Structural features:
The Bi-Encoder structure is adopted to generate vector representations of queries and documents respectively through two neural networks.
Implementation:
Usually the same network architecture is used, which can be understood as using the same Embedding model to generate independent vectors.
Advantages

Vector representations of documents can be pre-computed and indexed
Directly use cosine similarity calculation when querying, greatly improving the speed
Very suitable for fast retrieval on large-scale data sets

limitation

Document information is compressed into a single vector
The context cannot be adjusted dynamically when querying
This compression may result in information loss and affect retrieval accuracy.

Cross Encoder

How it works
Combine the query and document and feed them into the same Transformer model
Features
Generating higher quality relevance scores through deep interactions
Advantages

Ability to reanalyze document content in the context of a specific query
The resulting relevance score is more precise

limitation

Need to be recalculated every time you query
Slow speed, not suitable for processing large data sets

3. Best Practices for Optimizing Search Systems

In actual retrieval systems, the advantages of Bi-Encoder and Cross Encoder are usually combined to balance speed and accuracy. The typical implementation method is to adopt a second-order retrieval design :

Phase 1: Use Bi-Encoder for fast retrieval to filter out candidate documents related to the query

Phase 2: Use Cross Encoder to more accurately score the relevance of these candidate documents

Key points: This two-stage method not only ensures retrieval efficiency but also improves ranking accuracy. Therefore, it is widely used in retrieval systems such as RAG and has become one of the standard practices of modern information retrieval systems.

5. 4 search configuration schemes in Dify

Four typical retrieval configurations in Dify

Vector-only retrieval:
Sort by cosine similarity and set Top K to control the number of results returned.
Vector retrieval + Rerank:
The rerank model optimizes the results of the first stage and may move a document that was originally ranked fifth to second place.
Full-text search + Rerank:
Similar to the second method, but the first stage uses keyword search.
Hybrid Retrieval + Rerank:
The most powerful configuration, you can set the Score Threshold to filter out low-quality results.

Key points: Practice has proved that the configuration of hybrid retrieval + Rerank model can greatly improve the accuracy of the RAG system.

6. Multi-knowledge base retrieval

When you need to search multiple knowledge bases, you can refer to the following methods:

Perform hybrid search on each knowledge base
Aggregate all results
Perform a multi-level knowledge rerank process
Filter the best results by Top K and Score Threshold

Summary: By properly configuring the hybrid retrieval and rerank models in Dify, the accuracy of the RAG system can be greatly improved, thus helping us build a better intelligent agent.