RAG Tuning Guide: Spring AI Alibaba Modular RAG Principles and Usage

Written by
Jasper Cole
Updated on:July-11th-2025
Recommendation

Master RAG technology and build an intelligent document retrieval system.
Core content:
1. RAG technical principles and core design concepts
2. Analysis of the four core steps of RAG
3. Spring AI implementation of RAG process and code examples

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

About RAG




Cloud Native


What is RAG (Retrieval Enhanced Generation)

RAG (Retrieval Augmented Generation) is a technical paradigm that combines information retrieval and text generation.


? Core design concept

RAG technology is like equipping AI with a "real-time encyclopedia brain". By using a mechanism that searches for information before answering, it allows AI to escape the "knowledge forgetting" dilemma of traditional models.


?️ Four core steps

1. Document cutting → Building a smart archive

  • Core mission: Convert massive documents into easily searchable knowledge fragments
  • Implementation:
    • Just like breaking down a thick dictionary into word cards
    • Use intelligent chunking algorithm to maintain semantic coherence
    • Label each piece of knowledge (e.g. "Technical Specifications", "Operation Guide")
? Key value: High-quality knowledge segmentation is like a library classification system, which determines the efficiency of subsequent retrieval

2. Vector encoding → Building semantic maps

  • Core conversion:
    • Use AI models to convert text into mathematical vectors
    • Make semantically similar content produce similar mathematical features
  • Data Storage:
    • All vectors are stored in a dedicated database
    • Create a quick search index (similar to a library catalog search system)
? Example effect: "Battery life" and "battery capacity" will be encoded as similar vectors

3. Similarity search → Intelligent data hunter

Response trigger process:

  1. Convert user questions into “question vectors”
  2. Search the knowledge base through multi-dimensional matching strategies:
    • Semantic Similarity
    • Keyword matching
    • Timeliness Weight
  1. Output the specified number of most relevant document fragments

4. Generate Enhancements → Professional Report Writing

Response building process:

  1. Specify search results as references
  2. Automatically associate relevant knowledge fragments when AI generates
  3. Output formats can include:
    • Natural language answer
    • Attached reference material traceability path

? Example output:

"According to Chapter 5 of the Product Manual v2.3: The battery life of this device is..."

Spring AI implements basic RAG process




Cloud Native


Core implementation code

Configuration Class

@Configurationpublic class RagConfig {
@BeanChatClient chatClient(ChatClient.Builder builder) {return builder.defaultSystem("As an expert in robot products, you will answer users' usage needs").build();}
@BeanVectorStore vectorStore(EmbeddingModel embeddingModel) {SimpleVectorStore simpleVectorStore = SimpleVectorStore.builder(embeddingModel).build();
// Generate a robot product manual documentList<Document> documents = List.of(new Document("Product Manual: Product Name: Intelligent Robot\n" +"Product Description: An intelligent robot is an intelligent device that can automatically complete various tasks.\n" +"Function:\n" +"1. Automatic navigation: The robot can automatically navigate to the specified location.\n" +"2. Automatic Grasping: The robot can automatically grab objects.\n" +"3. Automatic placement: The robot can automatically place items.\n"));
simpleVectorStore.add(documents);return simpleVectorStore;}

}

Through this configuration class, complete the following:

1. Configure ChatClient as a Bean, in which the system default role is set to robot product expert, which is responsible for processing user queries and generating answer vector storage configuration.

2. Initialize SimpleVectorStore, load the robot product manual document, and convert the document into vector form for storage.

SimpleVectorStore stores vectors in a memory ConcurrentHashmap. Spring AI provides a variety of storage methods, such as Redis, MongoDB, etc. You can choose a suitable storage method according to actual conditions.

Search Enhancement Service

@RestController@RequestMapping("/ai")public class RagController {
@Autowiredprivate ChatClient chatClient;
@Autowiredprivate VectorStore vectorStore;

@PostMapping(value = "/chat", produces = "text/plain; charset=UTF-8")public String generation(String userInput) {// Initiate a chat request and process the responsereturn chatClient.prompt().user(userInput).advisors(new QuestionAnswerAdvisor(vectorStore)).call().content();}}
By adding QuestionAnswerAdvisor and providing the corresponding vector storage, the previously placed documents can be used as reference material and generate enhanced answers.

Run the program
Start the Spring Boot application and access the /ai/chat interface to pass in user questions to get enhanced answers. As follows:
POST http://localhost:8080/spring-ai/ai/chat?userInput=What functions does the robot have?
HTTP/1.1 200Content-Type: text/plain;charset=UTF-8
According to the intelligent robot product manual you provided, the main functions of the robot include:
1. Automatic navigation: The robot can automatically navigate to the specified location.2. Automatic grasping: The robot can grasp objects automatically.3. Automatic placement: The robot can automatically place items.
If you need more detailed information or have questions about other features, please provide specific requirements and I will do my best to help you.
In this way, the test results can clearly see the answers generated by AI and refer to the relevant information in the robot product manual.

Spring AI modular RAG enhancement




Cloud Native


Multi Query Expansion
Multi-query expansion is a key technology to improve the retrieval effect of RAG systems. In practical applications, users' queries are often short and incomplete, which may lead to inaccurate or incomplete retrieval results. Spring AI provides a powerful multi-query expansion mechanism that can automatically generate multiple related query variants, thereby improving the accuracy and recall of retrieval.
// Create a chat client instance// Set system prompt information and define the AI ​​assistant as a professional interior design consultantChatClient chatClient = builder.defaultSystem("You are a professional interior design consultant who is proficient in various decoration styles, material selection and space layout. Please provide users with professional, detailed and practical suggestions based on the reference materials provided. When answering, please note:\n" +"1. Accurately understand the specific needs of users\n" +"2. Combined with actual cases in reference materials\n" +"3. Provide professional design concepts and principle explanations\n" +"4. Consider practicality, aesthetics and cost-effectiveness\n" +"5. Alternative solutions can be provided if necessary").build();
// Build the query expander// Used to generate multiple related query variations to get more comprehensive search resultsMultiQueryExpander queryExpander = MultiQueryExpander.builder().chatClientBuilder(builder).includeOriginal(false) // Do not include the original query.numberOfQueries(3) // Generate 3 query variants.build();
// Perform query expansion// Expand the original question "Please provide several recommended decoration styles?" into multiple related queriesList<Query> queries = queryExpander.expand(new Query("Please provide several recommended decoration styles?"));

In this process, the system automatically generates multiple relevant query variants. For example, when a user queries "Please provide several recommended decoration styles?", the system generates multiple queries from different angles. This approach not only improves the comprehensiveness of the search, but also captures the user's potential query intent.

The effect is as follows:
Expanded query content: 1. Which decoration styles are the most popular? Please recommend some. 2. Can you recommend some popular home decoration styles? 3. I want to know different decoration styles, which ones are worth recommending?

Key benefits of multi-query expansion:

  1. Improve recall: Increase the chances of retrieving relevant documents through multiple query variations
  2. Covering different angles: understanding and expanding the user’s original query from different dimensions
  3. Enhanced semantic understanding: capturing multiple possible meanings of a query and related concepts
  4. Improve search quality: combine multiple query results to obtain more comprehensive information

Query Rewrite
Query rewriting is an important optimization technology in the RAG system, which can transform the user's original query into a more structured and clear form. This transformation can improve the accuracy of retrieval and help the system better understand the user's true intention.
Spring AI provides RewriteQueryTransformer to implement query rewriting. The following is a specific example:
// Create a query scenario that simulates a user learning AIQuery query = new Query("I am learning artificial intelligence, what is a large language model?");
// Create a query rewrite converterQueryTransformer queryTransformer = RewriteQueryTransformer.builder().chatClientBuilder(builder).build();
// Perform query rewriteQuery transformedQuery = queryTransformer.transform(query);
// Output the rewritten querySystem.out.println(transformedQuery.text());

The rewritten query might become:

What is a large language model?

The main advantages of query rewriting are: Query clarification: converting vague questions into specific query points

This transformation not only helps the system retrieve more relevant documents, but also helps generate more comprehensive and professional answers.

Query Translation
Query translation is a useful feature in the RAG system that can translate a user's query from one language to another. This is particularly useful for multilingual support and cross-language retrieval. Spring AI provides TranslationQueryTransformer to implement this feature.
// Create an English queryQuery query = new Query("What is LLM?");
// Create a query translation converter and set the target language to ChineseQueryTransformer queryTransformer = TranslationQueryTransformer.builder().chatClientBuilder(builder).targetLanguage("chinese") // Set the target language to Chinese.build();
// Perform query translationQuery transformedQuery = queryTransformer.transform(query);
// Output the translated querySystem.out.println(transformedQuery.text());

Translated query results:

What is a large language model?

Key benefits of query translation:

  1. Multi-language support: support query conversion between different languages
  2. Localization: Converting queries into natural expressions in the target language
  3. Cross-language search: supports searching documents in different languages
  4. User-friendly: allows users to query in a language they are familiar with

Context-aware Queries
In actual conversations, users' questions often depend on the previous conversation context. The following example uses a real estate consulting scenario to illustrate the implementation of context-aware query:
// Build a query with historical context // This example simulates a real estate consultation scenario, where the user first asks about the location of the community and then about the price of the house Query query = Query.builder().text("What is the average price of second-hand houses in this community?") // Current user's question.history(new UserMessage("Where is the Bihaiwan community in Nanshan District, Shenzhen?"), // User's question in the historical conversation new AssistantMessage("Bihaiwan community is located in the Houhai Central District, Nanshan District, Shenzhen, close to the Houhai subway station.")) // AI's answer.build();

In this example:

  1. The user first asked about the location of the Bihaiwan community (historical dialogue)
  2. The system answered the specific location information of the community (historical answer)
  3. The user then asked, "What is the average price of second-hand houses in this community?" (Current query)

Without considering the context, the system will not understand which cell "this cell" refers to. To solve this problem, we use CompressionQueryTransformer to process the context information:

// Create a query converter// QueryTransformer is used to transform a query with context into a complete independent queryQueryTransformer queryTransformer = CompressionQueryTransformer.builder().chatClientBuilder(builder).build();
// Perform query conversion// Convert the ambiguous pronoun reference ("this community") to a clear entity name ("Bihaiwan Community")Query transformedQuery = queryTransformer.transform(query);

The converted query will become more specific, such as "What is the average price of second-hand houses in Bihaiwan Community, Nanshan District, Shenzhen?" This conversion has the following advantages:

  1. Disambiguation: The query target (Bihaiwan Community) is clearly specified
  2. Preserve context: Contains geographic location information (Nanshan District, Shenzhen)
  3. Improve accuracy: Enable the system to retrieve relevant information more precisely

Output result: What is the average price of second-hand houses in Bihaiwan Community, Nanshan District, Shenzhen?

DocumentJoiner
In practical applications, we often need to obtain documents from multiple queries or multiple data sources. In order to effectively manage and integrate these documents, Spring AI provides the ConcatenationDocumentJoiner document merger. This tool can intelligently merge documents from multiple sources into a unified document collection.
Main features of Document Merger:
  1. Intelligent deduplication: When there are duplicate documents, only the first occurrence is retained
  2. Score Preservation: The original relevance score of each document is maintained during the merge process
  3. Multi-source support: Supports simultaneous processing of documents from different queries and different data sources
  4. Order maintenance: keep the original retrieval order of documents

Here is an example of usage:

// A collection of documents obtained from multiple queries or data sourcesMap<Query, List<List<Document>>> documentsForQuery = ...
// Create a document merger instanceDocumentJoiner documentJoiner = new ConcatenationDocumentJoiner();
// Perform document mergingList<Document> documents = documentJoiner.join(documentsForQuery);

This merging mechanism is particularly useful in the following scenarios:

  1. Multiple query rounds: you need to merge the document results returned by multiple queries
  2. Cross-source retrieval: Retrieve documents from different data sources (such as databases, file systems)
  3. Query expansion: When using query expansion to generate multiple related queries, all results need to be merged
  4. Incremental update: adding new search results to an existing document collection


RetrievalAugmentationAdvisor
RetrievalAugmentationAdvisor is a powerful tool provided by Spring AI that automates the document retrieval and query augmentation process. This advisor component seamlessly integrates document retrieval with query processing, enabling AI assistants to provide more accurate answers based on the retrieved relevant documents.

Basic usage

Here is a basic usage example of RetrievalAugmentationAdvisor:
// 1. Initialization vector storageSimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel).build();
// 2. Add documents to vector storageList<Document> documents = List.of(new Document("Product Manual: Product Name: Intelligent Robot\n" +"Product Description: An intelligent robot is an intelligent device that can automatically complete various tasks.\n" +"Function:\n" +"1. Automatic navigation: The robot can automatically navigate to the specified location.\n" +"2. Automatic Grasping: The robot can automatically grab objects.\n" +"3. Automatic placement: The robot can automatically place items.\n"));vectorStore.add(documents);
// 3. Create a search enhancement advisorAdvisor advisor = RetrievalAugmentationAdvisor.builder().documentRetriever(VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).build()).build();
// 4. Using advisors in chat clientsString response = chatClient.prompt().user("What functions does the robot have?").advisors(advisor) // Add search enhancement advisor.call().content();

This base implementation provides the following functionality:

  1. Automatic document retrieval: Automatically retrieve relevant documents based on user questions
  2. Contextual integration: Integrate the retrieved document content into the answer
  3. Intelligent answer generation: Generate accurate answers based on the retrieved information

Advanced Configuration Options

RetrievalAugmentationAdvisor supports several advanced configurations:
Advisor advisor = RetrievalAugmentationAdvisor.builder() // Configure the query augmenter.queryAugmenter(ContextualQueryAugmenter.builder().allowEmptyContext(true) // Allow empty context queries.maxTokens(300) // Limit query length.temperature(0.7) // Control the creativity of query expansion.build()) // Configure the document retriever.documentRetriever(VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).similarityThreshold(0.5) // Similarity threshold.topK(3) // Number of documents returned.minScore(0.1) // Minimum match score.maxDistance(0.8) // Maximum vector distance.build()) .build();

The main configuration options are:

  1. Query Booster Configuration:
    • Context processing strategy: defines how to process conversation history and context information, including context window size, historical message weight, etc.
    • Null value handling: Specifies the handling strategy when a query lacks certain parameters, such as using default values ​​or throwing exceptions
    • Query conversion rules: set how to convert the original query into a more effective retrieval form, including synonym expansion, keyword extraction, etc.
  1. Document retriever configuration:
    • Similarity threshold setting: Determine the minimum similarity requirement for document matching. Documents below this threshold will be filtered out.
    • Limit the number of returned results: Control the maximum number of documents returned per search to avoid returning too many irrelevant results
    • Document filtering rules: define filtering conditions based on metadata, such as time range, document type, tags, etc.

Document Selection
With an understanding of the RAG, let's look at a more complex document selection mechanism. Document selection is one of the core components of the RAG system, which determines how accurate and relevant information the system can provide to users.

Document structure design

First, let's look at an example of a well-structured document:
// Generate interior design case documentList<Document> documents = new ArrayList<>();
// Modern minimalist style living room casedocuments.add(new Document("Case Number: LR-2023-001\n" +"Project Overview: 180 square meters of modern minimalist style living room renovation" +"Design points:\n" +"1. 5.2-meter-high floor-to-ceiling windows are used to maximize natural light\n" +"2. Main color: Cloud White (matte, NCS S0500-N) with Morandi Gray\n" +"3. Furniture selection: Italian B&B brand leather sofa, Nordic white oak coffee table\n" +"4. Lighting design: Recessed downlights with Italian Flos pendant lights\n" +"5. Soft furnishings: imported black walnut TV wall, geometric pattern carpet\n" +"Space effect: transparent and grand, suitable for business reception and family daily life",Map.of("type", "interior", // Document type"year", "2023", // year"month", "06", // month"location", "indoor", // location type"style", "modern", // Decoration style"room", "living_room" // Room type)));

Each document contains two main parts:

  1. Document content: structured text description, including project number, overview, details, etc.
  2. Metadata: key-value pairs for quick filtering and sorting, such as type, year, location, etc.

Advanced search implementation

Here is a complete advanced search example:
// 1. Initialization vector storageSimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel).build();
// 2. Configure the AI ​​assistant roleChatClient chatClient = builder.defaultSystem("You are a professional interior design consultant who is proficient in various decoration styles, material selection and space layout. Please provide users with professional, detailed and practical suggestions based on the reference materials provided. When answering, please note:\n" +"1. Accurately understand the specific needs of users\n" +"2. Combined with actual cases in reference materials\n" +"3. Provide professional design concepts and principle explanations\n" +"4. Consider practicality, aesthetics and cost-effectiveness\n" +"5. Alternative solutions can be provided if necessary").build();
// 3. Construct complex document filtering conditionsvar b = new FilterExpressionBuilder();var filterExpression = b.and(b.and(b.eq("year", "2023"), // Filter cases in 2023b.eq("location", "indoor")), // Select only indoor casesb.and(b.eq("type", "interior"), // Type is interior designb.in("room", "living_room", "study", "kitchen") // Specify room type));
// 4. Configure the document retrieverDocumentRetriever retriever = VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).similarityThreshold(0.5) // Set the similarity threshold.topK(3) // Return the top 3 most relevant documents.filterExpression(filterExpression.build()).build();
// 5. Create a context-aware query enhancerAdvisor advisor = RetrievalAugmentationAdvisor.builder().queryAugmenter(ContextualQueryAugmenter.builder().allowEmptyContext(true).build()).documentRetriever(retriever).build();
// 6. Execute the query and get the responseString userQuestion = "Based on the information provided, please describe all relevant scenario styles, output the case number, and describe its content as detailed as possible.";String response = chatClient.prompt().user(userQuestion).advisors(advisor).call().content();

This implementation includes the following key features:

  1. Metadata filtering:
    • Use FilterExpressionBuilder to build complex filter conditions
    • Supports multiple filtering methods such as exact match (eq), range query (in), etc.
    • Multiple conditions (and/or) can be combined to achieve accurate screening
  1. Similarity control:
    • Set the similarity threshold (0.3) through similarityThreshold
    • Use topK to limit the number of results returned (3)
    • Ensure that only the most relevant documents are returned
  1. Context-aware:
    • Integrate ContextualQueryAugmenter to achieve context awareness
    • Allow empty context queries (allowEmptyContext)
    • Automatically associate related documents with query context
  1. Smart Advisor Integration:
    • Use RetrievalAugmentationAdvisor to enhance query performance
    • Automatically integrate document retrieval and query processing
    • Providing smarter response generation

Through this multi-level document selection mechanism, the system can:

  1. Quickly locate relevant documents
  2. Accurately assess document relevance
  3. Intelligent combination of multiple information sources
  4. Generate high-quality answers

Error Handling and Edge Cases
In a production environment, the RAG system needs to gracefully handle various edge cases, especially when document retrieval fails or relevant documents are not found. By using ContextualQueryAugmenter, we can implement a more friendly error handling mechanism:
// 1. Build a search enhancement advisorAdvisor advisor = RetrievalAugmentationAdvisor.builder().queryAugmenter(ContextualQueryAugmenter.builder().allowEmptyContext(true) // Allow empty context to avoid NPE.build()).documentRetriever(retriever).build();
// 2. Execute the query and handle possible exceptionsreturn chatClient.prompt().user(query).advisors(advisor).call().getContent();

Comparison of operation effects:

Results before modification:
AI Answer: I'm sorry, but it appears that the specific details or references you mentioned for your interior design query are not included in my current knowledge base. To provide you with the best possible advice, I would need more information about your project, such as the style you're aiming for, the size of the space, your budget, and any specific elements you want to include or avoid. If you can provide more details, I would be more than happy to offer tailored advice on interior design, space planning, material selection, and more.

The modified results are shown below:

AI Answer: I'm sorry, you didn't provide specific references or case numbers. In order to provide a detailed description of the scene style, I need you to provide specific case numbers or relevant materials. Once you provide this information, I will be able to accurately describe the relevant scene style, including the following:
1. Design style and theme2. Spatial layout and functional planning3. Material selection and color matching4. Lighting design and atmosphere creation5. Furniture configuration and soft furnishings matching

By using ContextualQueryAugmenter, we achieved the following improvements:

  1. Friendly error message:
    • Reply in Chinese, which is more in line with user habits
    • Provide clear follow-up instructions
    • Specific information required to describe
  1. Structured response format:
    • Clearly list the types of information that can be provided
    • Use numbered lists to improve readability
    • Maintain professionalism and integrity
  1. Context-Aware Processing:
    • Automatically handle empty context cases
    • Keep the conversation going
    • Guide users to provide necessary information
This style of error handling not only provides a better user experience, but also helps to collect more complete information about user needs, thereby providing more accurate responses.

Best Practices for Structured RAGs




Cloud Native

When actually deploying and operating a RAG system, we need to consider the best practices of the system from multiple dimensions. The following is a complete practice guide:


Documentation Best Practices

Document structure design

  • Structured content: The document should contain a clear structure, such as case number, project overview, design points, etc.
  • Metadata annotation: Add rich metadata to each document, such as:
Map.of("type", "interior", // document type "year", "2023", // year "style", "modern" // style type)

Document cutting strategy

  • Use intelligent chunking algorithm to maintain semantic coherence
  • Label each piece of knowledge
  • Keep the document size appropriate, avoid being too long or too short

Search Enhancement Strategy

Multi Query Expansion

  • Enable multiple query expansion mechanism to improve retrieval accuracy
  • Set an appropriate number of queries (3-5 is recommended)
  • Preserve the core semantics of the original query

Query Rewriting and Translation

  • Use RewriteQueryTransformer to optimize query structure
  • Configure TranslationQueryTransformer to support multiple languages
  • Maintaining the semantic integrity of queries

System Configuration Best Practices

Vector storage configuration

SimpleVectorStore vectorStore = SimpleVectorStore.builder(embeddingModel).build();
  • Choose the right vector storage solution

  • Choose a storage method based on data size (memory/Redis/MongoDB)

Retriever Configuration

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder().vectorStore(vectorStore).similarityThreshold(0.5)//Similarity threshold.topK(3) // Return the number of documents.build();
  • Set a reasonable similarity threshold

  • Controlling the number of documents returned
  • Configuring document filtering rules

Error handling mechanism

Exception handling

  • Allow empty context queries
  • Provide friendly error messages
  • Guide users to provide necessary information

Boundary Case Handling

ContextualQueryAugmenter.builder().allowEmptyContext(true).build()
  • Handling Document Not Found

  • Dealing with low similarity
  • Handling query timeouts

System role settings

AI Assistant Configuration

ChatClient chatClient = builder.defaultSystem("You are a professional consultant, please note:\n" +"1. Accurately understand user needs\n" +"2. Combine reference materials\n" +"3. Provide professional explanation\n" +"4. Consider practicality\n" +"5. Provide alternative solutions").build();
  • Set clear roles

  • Define answer specifications
  • Ensure professionalism and practicality

Performance optimization suggestions

Query Optimization

  • Using document filter expressions
  • Set a reasonable search threshold
  • Optimize the number of query expansions

Resource Management

  • Control the number of documents loaded
  • Optimizing memory usage
  • Reasonable cache strategy setting
By following the above best practices, you can build an efficient and reliable RAG system that provides users with accurate and professional answers. These practices cover all aspects from document processing to system configuration, which can help developers build better RAG applications.