RAG Tuning Guide: Spring AI Alibaba Modular RAG Principles and Usage

Master RAG technology and build an intelligent document retrieval system.
Core content:
1. RAG technical principles and core design concepts
2. Analysis of the four core steps of RAG
3. Spring AI implementation of RAG process and code examples
About RAG
Cloud Native
RAG (Retrieval Augmented Generation) is a technical paradigm that combines information retrieval and text generation.
RAG technology is like equipping AI with a "real-time encyclopedia brain". By using a mechanism that searches for information before answering, it allows AI to escape the "knowledge forgetting" dilemma of traditional models.
1. Document cutting → Building a smart archive
Core mission: Convert massive documents into easily searchable knowledge fragments Implementation:
Just like breaking down a thick dictionary into word cards Use intelligent chunking algorithm to maintain semantic coherence Label each piece of knowledge (e.g. "Technical Specifications", "Operation Guide")
2. Vector encoding → Building semantic maps
Core conversion:
Use AI models to convert text into mathematical vectors Make semantically similar content produce similar mathematical features
Data Storage:
All vectors are stored in a dedicated database Create a quick search index (similar to a library catalog search system)
3. Similarity search → Intelligent data hunter
Response trigger process:
Convert user questions into “question vectors” Search the knowledge base through multi-dimensional matching strategies:
Semantic Similarity Keyword matching Timeliness Weight
Output the specified number of most relevant document fragments
4. Generate Enhancements → Professional Report Writing
Response building process:
Specify search results as references Automatically associate relevant knowledge fragments when AI generates Output formats can include:
Natural language answer Attached reference material traceability path
? Example output:
"According to Chapter 5 of the Product Manual v2.3: The battery life of this device is..."
Spring AI implements basic RAG process
Cloud Native
Configuration Class
@Configuration
public class RagConfig {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultSystem("As an expert in robot products, you will answer users' usage needs")
.build();
}
@Bean
VectorStore vectorStore(EmbeddingModel embeddingModel) {
SimpleVectorStore simpleVectorStore = SimpleVectorStore.builder(embeddingModel)
.build();
// Generate a robot product manual document
List<Document> documents = List.of(
new Document("Product Manual: Product Name: Intelligent Robot\n" +
"Product Description: An intelligent robot is an intelligent device that can automatically complete various tasks.\n" +
"Function:\n" +
"1. Automatic navigation: The robot can automatically navigate to the specified location.\n" +
"2. Automatic Grasping: The robot can automatically grab objects.\n" +
"3. Automatic placement: The robot can automatically place items.\n"));
simpleVectorStore.add(documents);
return simpleVectorStore;
}
}
Through this configuration class, complete the following:
1. Configure ChatClient as a Bean, in which the system default role is set to robot product expert, which is responsible for processing user queries and generating answer vector storage configuration.
SimpleVectorStore stores vectors in a memory ConcurrentHashmap. Spring AI provides a variety of storage methods, such as Redis, MongoDB, etc. You can choose a suitable storage method according to actual conditions.
Search Enhancement Service
@RestController
@RequestMapping("/ai")
public class RagController {
@Autowired
private ChatClient chatClient;
@Autowired
private VectorStore vectorStore;
@PostMapping(value = "/chat", produces = "text/plain; charset=UTF-8")
public String generation(String userInput) {
// Initiate a chat request and process the response
return chatClient.prompt()
.user(userInput)
.advisors(new QuestionAnswerAdvisor(vectorStore))
.call()
.content();
}
}
POST http://localhost:8080/spring-ai/ai/chat?userInput=What functions does the robot have?
HTTP/1.1 200
Content-Type: text/plain;charset=UTF-8
According to the intelligent robot product manual you provided, the main functions of the robot include:
1. Automatic navigation: The robot can automatically navigate to the specified location.
2. Automatic grasping: The robot can grasp objects automatically.
3. Automatic placement: The robot can automatically place items.
If you need more detailed information or have questions about other features, please provide specific requirements and I will do my best to help you.
Spring AI modular RAG enhancement
Cloud Native
// Create a chat client instance
// Set system prompt information and define the AI assistant as a professional interior design consultant
ChatClient chatClient = builder
.defaultSystem("You are a professional interior design consultant who is proficient in various decoration styles, material selection and space layout. Please provide users with professional, detailed and practical suggestions based on the reference materials provided. When answering, please note:\n" +
"1. Accurately understand the specific needs of users\n" +
"2. Combined with actual cases in reference materials\n" +
"3. Provide professional design concepts and principle explanations\n" +
"4. Consider practicality, aesthetics and cost-effectiveness\n" +
"5. Alternative solutions can be provided if necessary")
.build();
// Build the query expander
// Used to generate multiple related query variations to get more comprehensive search results
MultiQueryExpander queryExpander = MultiQueryExpander.builder()
.chatClientBuilder(builder)
.includeOriginal(false) // Do not include the original query
.numberOfQueries(3) // Generate 3 query variants
.build();
// Perform query expansion
// Expand the original question "Please provide several recommended decoration styles?" into multiple related queries
List<Query> queries = queryExpander.expand(
new Query("Please provide several recommended decoration styles?"));
In this process, the system automatically generates multiple relevant query variants. For example, when a user queries "Please provide several recommended decoration styles?", the system generates multiple queries from different angles. This approach not only improves the comprehensiveness of the search, but also captures the user's potential query intent.
Expanded query content: 1. Which decoration styles are the most popular? Please recommend some. 2. Can you recommend some popular home decoration styles? 3. I want to know different decoration styles, which ones are worth recommending?
Key benefits of multi-query expansion:
Improve recall: Increase the chances of retrieving relevant documents through multiple query variations Covering different angles: understanding and expanding the user’s original query from different dimensions Enhanced semantic understanding: capturing multiple possible meanings of a query and related concepts Improve search quality: combine multiple query results to obtain more comprehensive information
// Create a query scenario that simulates a user learning AI
Query query = new Query("I am learning artificial intelligence, what is a large language model?");
// Create a query rewrite converter
QueryTransformer queryTransformer = RewriteQueryTransformer.builder()
.chatClientBuilder(builder)
.build();
// Perform query rewrite
Query transformedQuery = queryTransformer.transform(query);
// Output the rewritten query
System.out.println(transformedQuery.text());
The rewritten query might become:
What is a large language model?
The main advantages of query rewriting are: Query clarification: converting vague questions into specific query points
// Create an English query
Query query = new Query("What is LLM?");
// Create a query translation converter and set the target language to Chinese
QueryTransformer queryTransformer = TranslationQueryTransformer.builder()
.chatClientBuilder(builder)
.targetLanguage("chinese") // Set the target language to Chinese
.build();
// Perform query translation
Query transformedQuery = queryTransformer.transform(query);
// Output the translated query
System.out.println(transformedQuery.text());
Translated query results:
What is a large language model?
Key benefits of query translation:
Multi-language support: support query conversion between different languages Localization: Converting queries into natural expressions in the target language Cross-language search: supports searching documents in different languages User-friendly: allows users to query in a language they are familiar with
// Build a query with historical context // This example simulates a real estate consultation scenario, where the user first asks about the location of the community and then about the price of the house Query query = Query.builder().text("What is the average price of second-hand houses in this community?") // Current user's question.history(new UserMessage("Where is the Bihaiwan community in Nanshan District, Shenzhen?"), // User's question in the historical conversation new AssistantMessage("Bihaiwan community is located in the Houhai Central District, Nanshan District, Shenzhen, close to the Houhai subway station.")) // AI's answer.build();
In this example:
The user first asked about the location of the Bihaiwan community (historical dialogue) The system answered the specific location information of the community (historical answer) The user then asked, "What is the average price of second-hand houses in this community?" (Current query)
Without considering the context, the system will not understand which cell "this cell" refers to. To solve this problem, we use CompressionQueryTransformer to process the context information:
// Create a query converter
// QueryTransformer is used to transform a query with context into a complete independent query
QueryTransformer queryTransformer = CompressionQueryTransformer.builder()
.chatClientBuilder(builder)
.build();
// Perform query conversion
// Convert the ambiguous pronoun reference ("this community") to a clear entity name ("Bihaiwan Community")
Query transformedQuery = queryTransformer.transform(query);
The converted query will become more specific, such as "What is the average price of second-hand houses in Bihaiwan Community, Nanshan District, Shenzhen?" This conversion has the following advantages:
Disambiguation: The query target (Bihaiwan Community) is clearly specified Preserve context: Contains geographic location information (Nanshan District, Shenzhen) Improve accuracy: Enable the system to retrieve relevant information more precisely
Output result: What is the average price of second-hand houses in Bihaiwan Community, Nanshan District, Shenzhen?
Intelligent deduplication: When there are duplicate documents, only the first occurrence is retained Score Preservation: The original relevance score of each document is maintained during the merge process Multi-source support: Supports simultaneous processing of documents from different queries and different data sources Order maintenance: keep the original retrieval order of documents
Here is an example of usage:
// A collection of documents obtained from multiple queries or data sources
Map<Query, List<List<Document>>> documentsForQuery = ...
// Create a document merger instance
DocumentJoiner documentJoiner = new ConcatenationDocumentJoiner();
// Perform document merging
List<Document> documents = documentJoiner.join(documentsForQuery);
This merging mechanism is particularly useful in the following scenarios:
Multiple query rounds: you need to merge the document results returned by multiple queries Cross-source retrieval: Retrieve documents from different data sources (such as databases, file systems) Query expansion: When using query expansion to generate multiple related queries, all results need to be merged Incremental update: adding new search results to an existing document collection
Basic usage
// 1. Initialization vector storage
SimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel)
.build();
// 2. Add documents to vector storage
List<Document> documents = List.of(
new Document("Product Manual: Product Name: Intelligent Robot\n" +
"Product Description: An intelligent robot is an intelligent device that can automatically complete various tasks.\n" +
"Function:\n" +
"1. Automatic navigation: The robot can automatically navigate to the specified location.\n" +
"2. Automatic Grasping: The robot can automatically grab objects.\n" +
"3. Automatic placement: The robot can automatically place items.\n"));
vectorStore.add(documents);
// 3. Create a search enhancement advisor
Advisor advisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.build())
.build();
// 4. Using advisors in chat clients
String response = chatClient.prompt()
.user("What functions does the robot have?")
.advisors(advisor) // Add search enhancement advisor
.call()
.content();
This base implementation provides the following functionality:
Automatic document retrieval: Automatically retrieve relevant documents based on user questions Contextual integration: Integrate the retrieved document content into the answer Intelligent answer generation: Generate accurate answers based on the retrieved information
Advanced Configuration Options
Advisor advisor = RetrievalAugmentationAdvisor.builder() // Configure the query augmenter.queryAugmenter(ContextualQueryAugmenter.builder().allowEmptyContext(true) // Allow empty context queries.maxTokens(300) // Limit query length.temperature(0.7) // Control the creativity of query expansion.build()) // Configure the document retriever.documentRetriever(VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).similarityThreshold(0.5) // Similarity threshold.topK(3) // Number of documents returned.minScore(0.1) // Minimum match score.maxDistance(0.8) // Maximum vector distance.build()) .build();
The main configuration options are:
Query Booster Configuration:
Context processing strategy: defines how to process conversation history and context information, including context window size, historical message weight, etc. Null value handling: Specifies the handling strategy when a query lacks certain parameters, such as using default values or throwing exceptions Query conversion rules: set how to convert the original query into a more effective retrieval form, including synonym expansion, keyword extraction, etc.
Document retriever configuration:
Similarity threshold setting: Determine the minimum similarity requirement for document matching. Documents below this threshold will be filtered out. Limit the number of returned results: Control the maximum number of documents returned per search to avoid returning too many irrelevant results Document filtering rules: define filtering conditions based on metadata, such as time range, document type, tags, etc.
Document structure design
// Generate interior design case document
List<Document> documents = new ArrayList<>();
// Modern minimalist style living room case
documents.add(new Document(
"Case Number: LR-2023-001\n" +
"Project Overview: 180 square meters of modern minimalist style living room renovation" +
"Design points:\n" +
"1. 5.2-meter-high floor-to-ceiling windows are used to maximize natural light\n" +
"2. Main color: Cloud White (matte, NCS S0500-N) with Morandi Gray\n" +
"3. Furniture selection: Italian B&B brand leather sofa, Nordic white oak coffee table\n" +
"4. Lighting design: Recessed downlights with Italian Flos pendant lights\n" +
"5. Soft furnishings: imported black walnut TV wall, geometric pattern carpet\n" +
"Space effect: transparent and grand, suitable for business reception and family daily life",
Map.of(
"type", "interior", // Document type
"year", "2023", // year
"month", "06", // month
"location", "indoor", // location type
"style", "modern", // Decoration style
"room", "living_room" // Room type
)));
Each document contains two main parts:
Document content: structured text description, including project number, overview, details, etc. Metadata: key-value pairs for quick filtering and sorting, such as type, year, location, etc.
Advanced search implementation
// 1. Initialization vector storage
SimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel)
.build();
// 2. Configure the AI assistant role
ChatClient chatClient = builder
.defaultSystem("You are a professional interior design consultant who is proficient in various decoration styles, material selection and space layout. Please provide users with professional, detailed and practical suggestions based on the reference materials provided. When answering, please note:\n" +
"1. Accurately understand the specific needs of users\n" +
"2. Combined with actual cases in reference materials\n" +
"3. Provide professional design concepts and principle explanations\n" +
"4. Consider practicality, aesthetics and cost-effectiveness\n" +
"5. Alternative solutions can be provided if necessary")
.build();
// 3. Construct complex document filtering conditions
var b = new FilterExpressionBuilder();
var filterExpression = b.and(
b.and(
b.eq("year", "2023"), // Filter cases in 2023
b.eq("location", "indoor")), // Select only indoor cases
b.and(
b.eq("type", "interior"), // Type is interior design
b.in("room", "living_room", "study", "kitchen") // Specify room type
));
// 4. Configure the document retriever
DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.similarityThreshold(0.5) // Set the similarity threshold
.topK(3) // Return the top 3 most relevant documents
.filterExpression(filterExpression.build())
.build();
// 5. Create a context-aware query enhancer
Advisor advisor = RetrievalAugmentationAdvisor.builder()
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(true)
.build())
.documentRetriever(retriever)
.build();
// 6. Execute the query and get the response
String userQuestion = "Based on the information provided, please describe all relevant scenario styles, output the case number, and describe its content as detailed as possible.";
String response = chatClient.prompt()
.user(userQuestion)
.advisors(advisor)
.call()
.content();
This implementation includes the following key features:
Metadata filtering:
Use FilterExpressionBuilder to build complex filter conditions Supports multiple filtering methods such as exact match (eq), range query (in), etc. Multiple conditions (and/or) can be combined to achieve accurate screening
Similarity control:
Set the similarity threshold (0.3) through similarityThreshold Use topK to limit the number of results returned (3) Ensure that only the most relevant documents are returned
Context-aware:
Integrate ContextualQueryAugmenter to achieve context awareness Allow empty context queries (allowEmptyContext) Automatically associate related documents with query context
Smart Advisor Integration:
Use RetrievalAugmentationAdvisor to enhance query performance Automatically integrate document retrieval and query processing Providing smarter response generation
Through this multi-level document selection mechanism, the system can:
Quickly locate relevant documents Accurately assess document relevance Intelligent combination of multiple information sources Generate high-quality answers
// 1. Build a search enhancement advisor
Advisor advisor = RetrievalAugmentationAdvisor.builder()
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(true) // Allow empty context to avoid NPE
.build())
.documentRetriever(retriever)
.build();
// 2. Execute the query and handle possible exceptions
return chatClient.prompt()
.user(query)
.advisors(advisor)
.call()
.getContent();
Comparison of operation effects:
AI Answer: I'm sorry, but it appears that the specific details or references you mentioned for your interior design query are not included in my current knowledge base. To provide you with the best possible advice, I would need more information about your project, such as the style you're aiming for, the size of the space, your budget, and any specific elements you want to include or avoid. If you can provide more details, I would be more than happy to offer tailored advice on interior design, space planning, material selection, and more.
The modified results are shown below:
AI Answer: I'm sorry, you didn't provide specific references or case numbers. In order to provide a detailed description of the scene style, I need you to provide specific case numbers or relevant materials. Once you provide this information, I will be able to accurately describe the relevant scene style, including the following:
1. Design style and theme
2. Spatial layout and functional planning
3. Material selection and color matching
4. Lighting design and atmosphere creation
5. Furniture configuration and soft furnishings matching
By using ContextualQueryAugmenter, we achieved the following improvements:
Friendly error message:
Reply in Chinese, which is more in line with user habits Provide clear follow-up instructions Specific information required to describe
Structured response format:
Clearly list the types of information that can be provided Use numbered lists to improve readability Maintain professionalism and integrity
Context-Aware Processing:
Automatically handle empty context cases Keep the conversation going Guide users to provide necessary information
Best Practices for Structured RAGs
Cloud Native
When actually deploying and operating a RAG system, we need to consider the best practices of the system from multiple dimensions. The following is a complete practice guide:
Document structure design
Structured content: The document should contain a clear structure, such as case number, project overview, design points, etc. Metadata annotation: Add rich metadata to each document, such as:
Map.of("type", "interior", // document type "year", "2023", // year "style", "modern" // style type)
Document cutting strategy
Use intelligent chunking algorithm to maintain semantic coherence Label each piece of knowledge Keep the document size appropriate, avoid being too long or too short
Multi Query Expansion
Enable multiple query expansion mechanism to improve retrieval accuracy Set an appropriate number of queries (3-5 is recommended) Preserve the core semantics of the original query
Query Rewriting and Translation
Use RewriteQueryTransformer to optimize query structure Configure TranslationQueryTransformer to support multiple languages Maintaining the semantic integrity of queries
Vector storage configuration
SimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel).build();
Choose the right vector storage solution
Choose a storage method based on data size (memory/Redis/MongoDB)
Retriever Configuration
DocumentRetriever retriever = VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).similarityThreshold(0.5)//Similarity threshold.topK(3) // Return the number of documents.build();
Set a reasonable similarity threshold
Controlling the number of documents returned Configuring document filtering rules
Exception handling
Allow empty context queries Provide friendly error messages Guide users to provide necessary information
Boundary Case Handling
ContextualQueryAugmenter.builder().allowEmptyContext(true).build()
Handling Document Not Found
Dealing with low similarity Handling query timeouts
AI Assistant Configuration
ChatClient chatClient = builder.defaultSystem("You are a professional consultant, please note:\n" +"1. Accurately understand user needs\n" +"2. Combine reference materials\n" +"3. Provide professional explanation\n" +"4. Consider practicality\n" +"5. Provide alternative solutions").build();
Set clear roles
Define answer specifications Ensure professionalism and practicality
Query Optimization
Using document filter expressions Set a reasonable search threshold Optimize the number of query expansions
Resource Management
Control the number of documents loaded Optimizing memory usage Reasonable cache strategy setting