Recommendation
Master AI-programming secrets, improve the accuracy of RAG tasks, and explore the infinite possibilities of AI.
Core content:
1. Application of multi-level retrieval strategy and semantic association technology in RAG tasks
2. Optimization level of knowledge preprocessing and query reconstruction engine
3. Implementation steps of zero-sample reasoning mode and dual-mode response mechanism
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
When we were doing the RAG (Retrieval-Augmented Generation) task, it was like we were taking a sophisticated open-book exam. Before answering a question, we would consult a pre-prepared knowledge base (including structured documents, field papers, and real-time updated industry reports) through a multi-level retrieval strategy. This dynamic enhancement mechanism not only improves the accuracy of answers by 35-50%, but also automatically generates a knowledge graph through semantic association technology, making the output answers professionally deep and scalable.
The complete RAG process includes five optimization levels: the knowledge preprocessing module cleans and annotates the raw data, and uses a hybrid embedding model (such as BAAI/bge and text2vec) for vector encoding. When processing user queries, the system uses the query reconstruction engine to identify the intent of the question, and uses a multi-way recall algorithm to perform millimeter-level retrieval in the vector database, and its similarity threshold can be dynamically adjusted through a feedback learning mechanism.
When a document fragment with a relevance of > 0.82 is retrieved , the system will start the evidence weighted fusion module, sort the core paragraphs and auxiliary materials by information entropy value, and generate a context window through adaptive compression technology. In the case of no valid information being retrieved, large models (such as Deepseek-128k or GPT-4-turbo) will actively switch to zero-sample reasoning mode and mark the knowledge boundary in the response. This dual-mode response mechanism ensures the reliability of professional field problems while maintaining the creativity of open domain dialogues.The specific implementation steps are as follows:
Phase 1: Data preprocessing: The original knowledge document is intelligently segmented based on semantic coherence, and parsed into discrete text blocks through natural language processing technology. A paragraph boundary detection algorithm based on the attention mechanism is adopted, combined with dependency syntax analysis and semantic role labeling to ensure that each text block carries a complete knowledge unit. After quality verification, it is structured and stored in a relational database that supports multi-dimensional indexing. If a deep representation model such as BERT-Whitening or Sentence-Transformer is used simultaneously to generate a 768-dimensional embedding vector for each text block, a hybrid vector database with hierarchical navigation capabilities can be constructed to achieve efficient similarity retrieval functions with sub-second response. At this stage, special attention should be paid to the data cleaning link, including removing format noise, unifying terminology expressions, and establishing a synonym mapping table.
Phase 2: Retrieval and response generation When a user initiates a query, the system first converts the natural language question into a high-density vector representation through the semantic parsing engine, and uses a multi-way recall strategy to execute in parallel: 1) Vector similarity calculation based on Faiss or HNSW to capture deep semantic associations; 2) Full-text matching combined with inverted index to ensure accurate keyword matching; 3) Multimodal retrieval technology integrating knowledge graphs to achieve cross-modal semantic alignment. Through a hybrid sorting algorithm based on attention weights, multi-level semantic retrieval is performed in the database to form a candidate set relevance score queue. After locating the Top-K matching text fragments, a dynamic context window mechanism is used to input the optimal paragraph and its associated metadata as restrictive context into a large-scale language model that has been fine-tuned in the field. This mechanism constructs a response verification module through knowledge distillation technology, which reduces the reasoning time to 40% of the traditional solution while ensuring the accuracy of the answer, and effectively suppresses the model hallucination phenomenon through comparative learning strategies. The output stage introduces a reinforcement learning reward model to ensure that the generated content has both domain terminology standardization, logical coherence and factual consistency with the source document, while supporting traceability annotation functions. Core Challenges
For enterprises and individual users, various policies, regulations and literature materials often present large text features. If such long texts are directly input into the big model system, it is difficult to achieve synchronous processing of the complete text due to the limited computing power of the existing big model, so a block processing mechanism needs to be implemented. However, it should be pointed out that the larger the block size, the better. Long texts mainly face the following technical bottlenecks in the vectorization process:
Block technology methodology
The previous analysis shows that chunking technology plays a core role in the RAG architecture, but it also faces technical challenges. This technology directly affects the accuracy of knowledge retrieval, so choosing a scientific chunking method has important research value. To achieve effective text segmentation, it is necessary to establish a balance between information integrity and processing efficiency. Specifically, a variety of technical solutions such as fixed-size chunking, semantic chunking, recursive chunking, document structure chunking, and LLM intelligent chunking can be used in combination .
The summary table is as follows:
| | | Typical application scenarios |
---|
1. Fixed size chunking | - ✅ Simple and fast implementation
- ✅ Extremely low computational cost
| - ❌ May sever semantic associations
- ❌ Need to debug the block size repeatedly
- ❌ Not friendly to long texts
| - Basic question answering system
- Processing unified format documents
|
2. Semantic Chunking | - ✅ Preserve complete semantic units
- ✅ Improve contextual relevance
- ✅ Dynamically adapt to content
| - ❌ Dependence on NLP model quality
- ❌ High computing resource consumption
| - Professional field analysis
|
3. Recursive Blocking | - ✅ Multi-granularity content coverage
- ✅ Strong ability to capture redundant information
- ✅ Flexible adjustment of layer depth
| - ❌ High implementation complexity
- ❌ Information duplication may occur
- ❌ Requires multi-layer index management
| - Academic Literature Processing
|
4. Document structure division | - ✅ Accurately locate chapter information
- ✅ Maintain the original logical structure
- ✅ Support cross-blockquote
| - ❌ Depends on document format specifications
- ❌ Difficulty in processing unstructured data
- ❌ Pre-parsing rules required
| - Technical manual processing
|
5. LLM-based Blocking | - ✅ Understand deep semantic intent
- ✅ Dynamically generate optimal block structure
- ✅ Suitable for complex tasks
| - ❌ Significant inference latency
- ❌ There is a risk of model hallucination
| - High-precision question answering system
|
Document chunking methods supported by RAGFlow
Choosing an appropriate segmentation strategy based on actual business scenarios and text features is a more optimized solution. RAGFlow provides support for a variety of document segmentation methods. The following table lists in detail the functional descriptions of different segmentation methods, applicable document formats, and their processing flow in the RAG system.
Chunking method:
| | |
---|
General | It supports many file formats, and you need to set the corresponding segmentation method yourself. It is difficult to control and needs to be combined with a natural language processing model to achieve good results. | DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML, JPEG, JPG, PNG, TIF, GIF |
Resume | | |
Q&A | Problem description and answer, more suitable for customer service questions and answers | |
Manual | Will use OCR to segment documents | |
Table (table format file | | |
Paper | | |
Book (Book Type) | | |
Laws | | |
Presentation | | |
Picture | | |
One (full file) | The file will not be cut, and one file will be given to the large model directly, which has a good context, but the file length depends on the length supported by the configured large model. | |
Tag | You need to set the tag description and tag in advance, similar to Q&A. In the above analysis, set the tag library, which will automatically match the hit block and add tags. | |
The recall mechanism is the core factor
Although the document segmentation technology has been initially mastered, the recall mechanism of segmented data is still a key link. How to effectively improve the accuracy of data recall has become a technical problem that needs to be overcome urgently.
The limitation of data recall accuracy is due to the interaction of multiple factors. It is difficult for single-dimensional optimization to fundamentally solve systemic problems. In view of this, it is necessary to build a multi-dimensional comprehensive optimization strategy to improve the accuracy of data retrieval.
The RAGFlow system has realized enhanced optimization of the data recall mechanism based on the aforementioned methodology. In actual application, it is recommended to systematically adjust and verify the multi-dimensional parameter combination in order to achieve better recall performance indicators.