Use the powerful combination of Ollama local model and Spring AI Alibaba to build the next generation of RAG applications

Written by
Clara Bennett
Updated on:July-12th-2025
Recommendation

Master the next generation of RAG technology and build efficient AI applications.

Core content:
1. Introduction to RAG application architecture and core components
2. Environment preparation and Ollama service startup method
3. Model download and Elasticsearch deployment guide

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

01

RAG Application Architecture Overview

Cloud Native


 1.1 Core Components

  • Spring AI: A Java AI development framework for the Spring ecosystem that provides a unified API to access AI infrastructure such as large models and vector databases.
  • Ollama: A local large model running engine, Docker for the large model era, that supports rapid experience and deployment of large models.

  • Spring AI Alibaba: Spring AI is enhanced and integrated with the DashScope model platform to quickly build large model applications.

  • Elasticsearch: A vector database that stores text vectorized data and supports semantic retrieval.

 1.2 Model selection

  1. Embedding model: nomic-embed-text:latest, used to vectorize text data.
  2. Ollama Chat model: deepseek-r1:8b, generates the final answer.

02

Environment Preparation

Cloud Native


 2.1 Start Ollama service

Docker Compose starts Ollama: (It also starts a model front-end system to interact with the Ollama model. )

services :
ollama : container_name : ollama image : ollama/ollama:latest ports : - 11434:11434
open-webui : image : ghcr.io/open-webui/open-webui:main container_name : open-webui ports : - 3005:8080 environment : - 'OLLAMA_BASE_URL=http://host.docker.internal:11434'# Allow container to access host network extra_hosts : - host.docker.internal:host-gateway

 2.2 Download the model

Execute the following command:


docker exec -it ollama ollama pull deepseek-r1:8bdocker exec -it ollama ollama pull nomic-embed-text:latest


Call the deepseek-r1:8b model in open-webui:

 2.3 Deploy Elasticsearch


services :
elasticsearch :image : docker.elastic.co/elasticsearch/elasticsearch:8.16.1container_name : elasticsearchprivileged : trueenvironment :- "cluster.name=elasticsearch"- "discovery.type=single-node"- "ES_JAVA_OPTS=-Xms512m -Xmx1096m"- bootstrap.memory_lock=truevolumes :- ./config/es.yaml:/usr/share/elasticsearch/config/elasticsearch.ymlports :- "9200:9200"- "9300:9300"deploy :resources :limits :cpus : "2"memory : 1000Mreservations :memory : 200M

Prepare the configuration file for es startup:

cluster .name : docker-esnode .name : es-node-1network .host : 0 .0 .0 .0network .publish_host : 0 .0 .0 .0http .port : 9200http .cors .enabled : truehttp .cors .allow-origin : "*"bootstrap .memory_lock : true
# Disable authentication and authorization. es 8.x is enabled by default .xpack .security .enabled : false

At this point, all the environment preparation steps for building a simple RAG application have been completed. Now let's start building the project.

03

Project Configuration

Cloud Native


 3.1 Dependency Introduction

<!-- Spring Boot Web Starter -->< dependency ><groupId> org.springframework.boot </groupId><artifactId> spring - boot-starter - web </artifactId><version> 3.3.4 </version></ dependency >
<!-- Spring AI Ollama Starter -->< dependency ><groupId> org.springframework.ai </groupId><artifactId> spring - ai -ollama-spring-boot - starter </artifactId>< version > 1.0.0-M5 </ version ></ dependency >
<!-- Vector storage -->< dependency ><groupId> org.springframework.ai </groupId><artifactId> spring - ai -elasticsearch- store </artifactId>< version > 1.0.0-M5 </ version ></ dependency >
<!-- PDF Parsing -->< dependency ><groupId> org.springframework.ai </groupId><artifactId> spring -ai-pdf- document - reader </artifactId>< version > 1.0.0-M5 </ version ></ dependency >

 3.2 Core Configuration

spring :
ai :# ollama configurationollama :base-url : http://127.0.0.1:11434chat :model : deepseek-r1:8bembedding :model : nomic-embed-text:latest# Vector database configurationvectorstore :elasticsearch :index-name : ollama-rag-embedding-indexsimilarity : cosinedimensions : 768
elasticsearch :uris : http://127.0.0.1:9200


in:

  • index-name is the es vector index name;
  • dimensions is the vector dimension generated by the vector model (needs to be consistent with the vector dimension generated by the vector model, the default value is 1576);
  • similarity defines an algorithm or metric used to measure the similarity between vectors. Here, cosine similarity is used, using high-dimensional sparse vectors.

If you want to customize the instantiation configuration of es, you need to introduce spring-ai-elasticsearch-store :

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-elasticsearch-store</artifactId><version>1.0.0-M5</version></dependency>

This is achieved through custom configuration beans in the project.

 3.3 Prompt Template

You are a MacOS expert, please answer based on the following context:
---------------------{question_answer_context}---------------------
Please answer in Chinese Markdown format based on the given context and historical information provided. If the answer is not in the context, please clearly state so.


04

Core Implementation

Cloud Native


 4.1 Text Vectorization

In Spring AI and Spring AI Alibaba, you can use almost any data source as the source of the knowledge base. In this example, PDF is used as the knowledge base document.

Spring AI Alibaba provides 40+ document-reader and parser plugins, which are used to load data into RAG applications.

public  class  KnowledgeInitializer  implements  ApplicationRunner  {
// Inject VectorStore instance, responsible for the incremental query operation of vectorized dataprivate final VectorStore vectorStore;
// Vector database client, es is used hereprivate final ElasticsearchClient elasticsearchClient;
// .....
@Overridepublic void run(ApplicationArguments args) {
// 1. load pdf resources.List <Resource> pdfResources = loadPdfResources();
// 2. parse pdf resources to Documents.List <Document> documents = parsePdfResource(pdfResources);
// 3. import to ES.importToES(documents);}
private List <Document> parsePdfResource( List <Resource> pdfResources) {
// Split the text according to the specified strategy and convert it into a Document resource objectfor (Resource springAiResource : pdfResources) {
// 1. parse documentDocumentReader reader = new PagePdfDocumentReader(springAiResource);List <Document> documents = reader.get();logger.info( "{} documents loaded" , documents.size());
// 2. split trunksList <Document> splitDocuments = new TokenTextSplitter().apply(documents);logger.info( "{} documents split" , splitDocuments.size());
// 3. add res listresList.addAll(splitDocuments);}}
// ......}

At this point, the process of converting text data into vector data is completed.

 4.2 RAG Service Layer

Next, we will use Ollama Starter in Spring AI to interact with the model and build a RAG application.

AIRagService.java

@Servicepublic  class  AIRagService  {
// Import system prompt tmpl@Value ( "classpath:/prompts/system-qa.st" )private Resource systemResource;
// Inject related bean instancesprivate final ChatModel ragChatModel;
private final VectorStore vectorStore;
// Text filtering to enhance vector retrieval accuracyprivate static final String textField = "content" ;
// ......
public Flux<String> retrieve (String prompt) {
// Load prompt tmplString promptTemplate = getPromptTemplate(systemResource);
// Enable hybrid search, including embedded and full-text searchSearchRequest searchRequest = SearchRequest.builder().topK( 4 ).similarityThresholdAll().build();
// Build chatClient and initiate a large model service call.return ChatClient.builder(ragChatModel).build().prompt().advisors( new QuestionAnswerAdvisor(vectorStore,searchRequest,promptTemplate)).user(prompt).stream().content();}}

 4.3 RAG Service Interface Layer

Write the user request interface, process the user request, and call the service to obtain the large model response:

@RestController@RequestMapping( "/rag/ai" )public  class  AIRagController  {
@Resourcepublic AIRagService aiRagService;
@GetMapping( "/chat/{prompt}" )public Flux<String> chat(@PathVariable( "prompt" ) String prompt,HttpServletResponse response) {
        // Solve the problem of garbled responses in stream mode.response.setCharacterEncoding( "UTF-8" );
if (!StringUtils.hasText(prompt)) {return Flux.just( "prompt is null." );}
return aiRagService.retrieve(prompt);}}
05


Request a Demo

Cloud Native


Here, we take the question "I am a newbie to Mac, I want to configure the trackpad of Mac to make it more useful, do you have any suggestions?" as an example. We can see that the answer of directly calling the model is more official and not very practical.


 5.1 Call directly from open-webui



 5.2 Calling the RAG application interface


It can be seen that the output of the RAG application is more accurate and meets user needs.

06

RAG Optimization

Cloud Native


 6.1 Using the DashScope Platform Model

When using local Ollama to deploy model services, the model running speed is limited by local resources, and the thinking process takes a lot of time. Therefore, we can use models on some cloud platforms to enhance the user experience.

Modify application.yaml to:

spring :application :name : ollama-rag
ai : dashscope : api-key : ${AI_DASHSCOPE_API_KEY} chat : options : model : deepseek-r1 embedding : enabled : false
ollama :base-url : http://127.0.0.1:11434chat :model : deepseek-r1:8benabled : falseembedding :model : nomic-embed-text:latest
vectorstore :elasticsearch :index-name : ollama-rag-embedding-indexsimilarity : cosinedimensions : 768
elasticsearch :uris : http://127.0.0.1:9200

Here, disable the Chat function of Ollama and use the DeepSeekR1 model on the DashScope platform through the Spring AI Alibaba Starter dependency.

Add dependencies:

<!-- Spring AI Alibaba DashScope --><dependency><groupId>com.alibaba.cloud.ai</groupId><artifactId>spring-ai-alibaba-starter</artifactId><version>1.0.0-M6.1</version></dependency>

Modify AIRAGService.java

public Flux< String > retrieve( String prompt) {
// Get the vector store prompt tmpl.String promptTemplate = getPromptTemplate(systemResource);
// Enable hybrid search, both embedding and full text searchSearchRequest searchRequest = SearchRequest.builder().topK( 4 ).similarityThresholdAll().build();
// Build ChatClient with retrieval rerank advisor:ChatClient runtimeChatClient = ChatClient.builder(chatModel).defaultAdvisors( new RetrievalRerankAdvisor(vectorStore,rerankModel,searchRequest,promptTemplate,0.1 )).build();
// Spring AI RetrievalAugmentationAdvisorAdvisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder().queryTransformers(RewriteQueryTransformer.builder().chatClientBuilder(ChatClient.builder(ragChatModel).build().mutate()).build()).documentRetriever(VectorStoreDocumentRetriever.builder().similarityThreshold( 0.50 ).vectorStore(vectorStore).build()).build();
// Retrieve and llm generate return ragClient.prompt().advisors(retrievalAugmentationAdvisor).user(prompt).stream().content();}

 6.2 Search Optimization

Spring AI Alibaba RAG documentation:

https://java2ai.com/docs/1.0.0-M5.1/tutorials/rag/


When using Spring AI to build a RAG application, we can set some personalized parameters when building QuestionAnswerAdvisor to make our RAG application achieve the best state when retrieving vector data.

 6.3 Data Preprocessing Optimization

During data preprocessing, you can:

  1. Clean up the data text by removing irrelevant documents, noise data, special characters, etc.
  2. Add some metadata information to improve the quality of index data;
  3. Optimize index structure, etc.
07

Troubleshooting

Cloud Native

Q: Vector import failed
A: Check whether the ES index dimension matches the model output
Q: The search results are irrelevant
A: Check whether the Embedding model matches the text type.
Q: Slow response
A: Adjust Ollama's computing resource configuration
Q: Failed to pull spring-ai-alibaba-starter dependencies
A: You need to configure the mvn repository
<repositories><repository><id>spring-milestones</id><name>Spring Milestones</name><url>https://repo.spring.io/milestone</url><snapshots><enabled>false</enabled></snapshots></repository><repository><id>spring-snapshots</id><name>Spring Snapshots</name><url>https://repo.spring.io/snapshot</url><releases><enabled>false</enabled></releases></repository></repositories>

08

Summarize

Cloud Native

The whole process of building a RAG application is divided into the following three steps:

  1. Data loading and cleaning: Load data from external knowledge bases, vectorize them, and store them in Elasticsearch.
  2. Model call optimization: Providing contextual information to large models through retrieval augmentation technology (RAG).
  3. Interactive service construction: Build REST API to achieve efficient interaction between applications and users.

Through RAG's retrieval enhancement, model answers can be more contextual, ultimately improving the user experience.