Use the powerful combination of Ollama local model and Spring AI Alibaba to build the next generation of RAG applications

Master the next generation of RAG technology and build efficient AI applications.
Core content:
1. Introduction to RAG application architecture and core components
2. Environment preparation and Ollama service startup method
3. Model download and Elasticsearch deployment guide
RAG Application Architecture Overview
Cloud Native
1.1 Core Components
Spring AI: A Java AI development framework for the Spring ecosystem that provides a unified API to access AI infrastructure such as large models and vector databases. Ollama: A local large model running engine, Docker for the large model era, that supports rapid experience and deployment of large models.
Spring AI Alibaba: Spring AI is enhanced and integrated with the DashScope model platform to quickly build large model applications.
Elasticsearch: A vector database that stores text vectorized data and supports semantic retrieval.
1.2 Model selection
Embedding model: nomic-embed-text:latest, used to vectorize text data. Ollama Chat model: deepseek-r1:8b, generates the final answer.
Environment Preparation
Cloud Native
2.1 Start Ollama service
services :
ollama :
container_name : ollama
image : ollama/ollama:latest
ports :
11434:11434
:
image : ghcr.io/open-webui/open-webui:main
container_name : open-webui
ports :
3005:8080
environment :
'OLLAMA_BASE_URL=http://host.docker.internal:11434'
# Allow container to access host network
extra_hosts :
host.docker.internal:host-gateway
2.2 Download the model
docker exec -it ollama ollama pull deepseek-r1:8bdocker exec -it ollama ollama pull nomic-embed-text:latest
2.3 Deploy Elasticsearch
services :
elasticsearch :
image : docker.elastic.co/elasticsearch/elasticsearch:8.16.1
container_name : elasticsearch
privileged : true
environment :
"cluster.name=elasticsearch"
"discovery.type=single-node"
"ES_JAVA_OPTS=-Xms512m -Xmx1096m"
bootstrap.memory_lock=true
volumes :
./config/es.yaml:/usr/share/elasticsearch/config/elasticsearch.yml
ports :
"9200:9200"
"9300:9300"
deploy :
resources :
limits :
cpus : "2"
memory : 1000M
reservations :
memory : 200M
cluster .name : docker-es
node .name : es-node-1
network .host : 0 .0 .0 .0
network .publish_host : 0 .0 .0 .0
http .port : 9200
http .cors .enabled : true
http .cors .allow-origin : "*"
bootstrap .memory_lock : true
# Disable authentication and authorization. es 8.x is enabled by default .
xpack .security .enabled : false
Project Configuration
Cloud Native
3.1 Dependency Introduction
<!-- Spring Boot Web Starter -->
< dependency >
<groupId> org.springframework.boot </groupId>
<artifactId> spring - boot-starter - web </artifactId>
<version> 3.3.4 </version>
</ dependency >
<!-- Spring AI Ollama Starter -->
< dependency >
<groupId> org.springframework.ai </groupId>
<artifactId> spring - ai -ollama-spring-boot - starter </artifactId>
< version > 1.0.0-M5 </ version >
</ dependency >
<!-- Vector storage -->
< dependency >
<groupId> org.springframework.ai </groupId>
<artifactId> spring - ai -elasticsearch- store </artifactId>
< version > 1.0.0-M5 </ version >
</ dependency >
<!-- PDF Parsing -->
< dependency >
<groupId> org.springframework.ai </groupId>
<artifactId> spring -ai-pdf- document - reader </artifactId>
< version > 1.0.0-M5 </ version >
</ dependency >
3.2 Core Configuration
spring :
ai :
# ollama configuration
ollama :
http://127.0.0.1:11434 :
chat :
model : deepseek-r1:8b
embedding :
model : nomic-embed-text:latest
# Vector database configuration
vectorstore :
elasticsearch :
ollama-rag-embedding-index :
similarity : cosine
dimensions : 768
elasticsearch :
uris : http://127.0.0.1:9200
index-name is the es vector index name; dimensions is the vector dimension generated by the vector model (needs to be consistent with the vector dimension generated by the vector model, the default value is 1576); similarity defines an algorithm or metric used to measure the similarity between vectors. Here, cosine similarity is used, using high-dimensional sparse vectors.
<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-elasticsearch-store</artifactId><version>1.0.0-M5</version></dependency>
3.3 Prompt Template
You are a MacOS expert, please answer based on the following context:
---------------------
{question_answer_context}
---------------------
Please answer in Chinese Markdown format based on the given context and historical information provided. If the answer is not in the context, please clearly state so.
Core Implementation
Cloud Native
4.1 Text Vectorization
public class KnowledgeInitializer implements ApplicationRunner {
// Inject VectorStore instance, responsible for the incremental query operation of vectorized data
private final VectorStore vectorStore;
// Vector database client, es is used here
private final ElasticsearchClient elasticsearchClient;
// .....
@Override
public void run(ApplicationArguments args) {
// 1. load pdf resources.
List <Resource> pdfResources = loadPdfResources();
// 2. parse pdf resources to Documents.
List <Document> documents = parsePdfResource(pdfResources);
// 3. import to ES.
importToES(documents);
}
private List <Document> parsePdfResource( List <Resource> pdfResources) {
// Split the text according to the specified strategy and convert it into a Document resource object
for (Resource springAiResource : pdfResources) {
// 1. parse document
DocumentReader reader = new PagePdfDocumentReader(springAiResource);
List <Document> documents = reader.get();
logger.info( "{} documents loaded" , documents.size());
// 2. split trunks
List <Document> splitDocuments = new TokenTextSplitter().apply(documents);
logger.info( "{} documents split" , splitDocuments.size());
// 3. add res list
resList.addAll(splitDocuments);
}
}
// ......
}
4.2 RAG Service Layer
public class AIRagService {
// Import system prompt tmpl
"classpath:/prompts/system-qa.st" ) (
private Resource systemResource;
// Inject related bean instances
private final ChatModel ragChatModel;
private final VectorStore vectorStore;
// Text filtering to enhance vector retrieval accuracy
private static final String textField = "content" ;
// ......
public Flux<String> retrieve (String prompt) {
// Load prompt tmpl
String promptTemplate = getPromptTemplate(systemResource);
// Enable hybrid search, including embedded and full-text search
SearchRequest searchRequest = SearchRequest.builder().
topK( 4 )
.similarityThresholdAll()
.build();
// Build chatClient and initiate a large model service call.
return ChatClient.builder(ragChatModel)
.build().prompt()
.advisors( new QuestionAnswerAdvisor(
vectorStore,
searchRequest,
promptTemplate)
).user(prompt)
.stream()
.content();
}
}
4.3 RAG Service Interface Layer
public class AIRagController {
public AIRagService aiRagService;
public Flux<String> chat(
String prompt,
HttpServletResponse response
) {
// Solve the problem of garbled responses in stream mode.
response.setCharacterEncoding( "UTF-8" );
if (!StringUtils.hasText(prompt)) {
return Flux.just( "prompt is null." );
}
return aiRagService.retrieve(prompt);
}
}
Request a Demo
Cloud Native
Here, we take the question "I am a newbie to Mac, I want to configure the trackpad of Mac to make it more useful, do you have any suggestions?" as an example. We can see that the answer of directly calling the model is more official and not very practical.
5.1 Call directly from open-webui
5.2 Calling the RAG application interface
RAG Optimization
Cloud Native
6.1 Using the DashScope Platform Model
spring :
application :
name : ollama-rag
ai :
dashscope :
${AI_DASHSCOPE_API_KEY} :
chat :
options :
model : deepseek-r1
embedding :
enabled : false
ollama :
http://127.0.0.1:11434 :
chat :
model : deepseek-r1:8b
enabled : false
embedding :
model : nomic-embed-text:latest
vectorstore :
elasticsearch :
ollama-rag-embedding-index :
similarity : cosine
dimensions : 768
elasticsearch :
uris : http://127.0.0.1:9200
<!-- Spring AI Alibaba DashScope --><dependency><groupId>com.alibaba.cloud.ai</groupId><artifactId>spring-ai-alibaba-starter</artifactId><version>1.0.0-M6.1</version></dependency>
public Flux< String > retrieve( String prompt) {
// Get the vector store prompt tmpl.
String promptTemplate = getPromptTemplate(systemResource);
// Enable hybrid search, both embedding and full text search
SearchRequest searchRequest = SearchRequest.builder().
topK( 4 )
.similarityThresholdAll()
.build();
// Build ChatClient with retrieval rerank advisor:
ChatClient runtimeChatClient = ChatClient.builder(chatModel)
.defaultAdvisors( new RetrievalRerankAdvisor(
vectorStore,
rerankModel,
searchRequest,
promptTemplate,
0.1 )
).build();
// Spring AI RetrievalAugmentationAdvisor
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
.queryTransformers(RewriteQueryTransformer.builder()
.chatClientBuilder(ChatClient.builder(ragChatModel).build().mutate())
.build())
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold( 0.50 )
.vectorStore(vectorStore)
.build())
.build();
// Retrieve and llm generate
return ragClient.prompt()
.advisors(retrievalAugmentationAdvisor)
.user(prompt)
.stream()
.content();
}
6.2 Search Optimization
https://java2ai.com/docs/1.0.0-M5.1/tutorials/rag/
6.3 Data Preprocessing Optimization
Clean up the data text by removing irrelevant documents, noise data, special characters, etc. Add some metadata information to improve the quality of index data; Optimize index structure, etc.
Troubleshooting
Cloud Native
<repositories><repository><id>spring-milestones</id><name>Spring Milestones</name><url>https://repo.spring.io/milestone</url><snapshots><enabled>false</enabled></snapshots></repository><repository><id>spring-snapshots</id><name>Spring Snapshots</name><url>https://repo.spring.io/snapshot</url><releases><enabled>false</enabled></releases></repository></repositories>
Summarize
Cloud Native
Data loading and cleaning: Load data from external knowledge bases, vectorize them, and store them in Elasticsearch. Model call optimization: Providing contextual information to large models through retrieval augmentation technology (RAG). Interactive service construction: Build REST API to achieve efficient interaction between applications and users.