Full-text search using Milvus' built-in Sparse-BM25 algorithm and hybrid search applied to the RAG system

Written by
Silas Grey
Updated on:July-14th-2025
Recommendation

The new full-text search and hybrid search features brought by Milvus 2.5 help enterprises achieve efficient AI vector data retrieval.

Core content:
1. Milvus 2.5 integrates Tantivy to support native full-text search
2. Built-in word segmenter and real-time BM25 statistics to improve search efficiency
3. Hybrid search performance is enhanced and compatible with dense vector queries

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
With the advent of the big data era, information retrieval technology plays an increasingly important role in various fields. As a high-performance vector retrieval engine, Alibaba Cloud Vector Retrieval Service Milvus is 100% compatible with open source Milvus. With its out-of-the-box, flexible expansion, and full-link alert capabilities, it has become an ideal choice for large-scale AI vector data similarity retrieval services for enterprises. Its latest version 2.5 has achieved significant enhancements in full-text retrieval, keyword matching, and hybrid search. The retrieval results in multi-modal retrieval, RAG and other scenarios can take into account both recall and precision. This article will detail how to use Milvus 2.5 to implement these functions, and explain its best practices in the Retrieve stage of RAG applications.

01

Background Information
Milvus 2.5 integrates the high-performance search engine library Tantivy and built-in Sparse-BM25 algorithm, realizing native full-text search function for the first time. This capability perfectly complements the existing semantic search function and provides users with a more powerful search experience.
  • Built-in Analyzer: Without additional preprocessing, Milvus can directly accept text input and automatically complete word segmentation, stop word filtering and sparse vector extraction through the built-in Analyzer and sparse vector extraction capabilities.
  • Real-time BM25 statistics: Dynamically update term frequency (TF) and inverse document frequency (IDF) when data is inserted to ensure the real-time and accuracy of search results.
  • Enhanced hybrid search performance: Sparse vector retrieval based on the approximate nearest neighbor (ANN) algorithm has far superior performance to traditional keyword systems, supports millisecond-level responses for billions of data, and is compatible with hybrid queries with dense vectors.

02

Prerequisites
  • A Milvus instance with kernel version 2.5 has been created. For details, see Quickly Create a Milvus Instance .
  • The service has been activated and the API-KEY has been obtained. For specific operations, see Obtaining and configuring the API-KEY .

03

Limitation of Use
  • Applicable to Milvus instances with kernel version 2.5 or later.
  • The Python SDK version for pymilvus is 2.5 and later.
You can execute the following command to check the currently installed version.
pip3 show pymilvus
If the version is lower than 2.5, please update it using the following command.
pip3 install --upgrade pymilvus

04

Operation process

Step 1: Install dependent libraries

pip3 install pymilvus langchain dashscope

Step 2: Data preparation

This article uses Milvus official documents as an example, segments text through LangChain SDK as the input of the Embedding model text-embedding-v2, and inserts the Embedding results and the original text into Milvus.
from langchain_community.document_loaders import WebBaseLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.embeddings import DashScopeEmbeddingsfrom pymilvus import MilvusClient, DataType, Function, FunctionType
dashscope_api_key = "<YOUR_DASHSCOPE_API_KEY>"milvus_url = "<YOUR_MMILVUS_URL>"user_name = "root"password = "<YOUR_PASSWORD>"collection_name = "milvus_overview"dense_dim = 1536
loader = WebBaseLoader(['https://raw.githubusercontent.com/milvus-io/milvus-docs/refs/heads/v2.5.x/site/en/about/overview.md'])
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=256)
# Use LangChain to split the input document according to chunk_sizeall_splits = text_splitter.split_documents(docs)
embeddings = DashScopeEmbeddings(model= "text-embedding-v2" , dashscope_api_key=dashscope_api_key)
text_contents = [doc.page_content for doc in all_splits]
vectors = embeddings.embed_documents(text_contents)

client = MilvusClient(uri=f "http://{milvus_url}:19530" ,token=f "{user_name}:{password}" ,)
schema = MilvusClient.create_schema(enable_dynamic_field=True,)
analyzer_params = {"type" : "english"}
# Add fields to schemaschema.add_field(field_name= "id" , datatype=DataType.INT64, is_primary=True, auto_id=True)schema.add_field(field_name= "text" , datatype=DataType.VARCHAR, max_length=65535, enable_analyzer=True, analyzer_params=analyzer_params, enable_match=True)schema.add_field(field_name= "sparse_bm25" , datatype=DataType.SPARSE_FLOAT_VECTOR)schema.add_field(field_name= "dense" , datatype=DataType.FLOAT_VECTOR, dim=dense_dim)
bm25_function = Function(name= "bm25" ,function_type=FunctionType.BM25,input_field_names=[ "text" ],output_field_names= "sparse_bm25" ,)schema.add_function(bm25_function)
index_params = client.prepare_index_params()
# Add indexesindex_params.add_index(field_name = "dense" ,index_name = "dense_index" ,index_type = "IVF_FLAT" ,metric_type = "IP" ,params={ "nlist" : 128},)
index_params.add_index(field_name = "sparse_bm25" ,index_name= "sparse_bm25_index" ,index_type = "SPARSE_WAND" ,metric_type= "BM25")
# Create collectionclient.create_collection(collection_name=collection_name,schema=schema,index_params=index_params)
data = [{ "dense" : vectors[idx], "text" : doc}for idx, doc in enumerate(text_contents)]
# Insert datares = client.insert(collection_name=collection_name,data=data)
print(f "Generate {len(vectors)} vectors, dimension: {len(vectors[0])}" )
The examples in this article involve the following parameters. Please replace them according to your actual environment.

parameter

illustrate

dashscope_api_key

Bailian's API-KEY.

milvus_url

The intranet address or public network address of the Milvus instance . You can view it on the instance details page of the Milvus instance.

  • If you use an intranet address, make sure that the client and the Milvus instance are in the same VPC.

  • If you use a public network address, please open the public network and ensure that the security group rules allow the corresponding port communication. For details, please refer to Network Access Type . (Link: https://x.sm.cn/J5cFtoI)

user_name

The user name and password you customized when creating a Milvus instance.

password

collection_name

The name of the collection. You can customize it. This article uses milvus_overview as an example.

dense_dim

Dense vector dimension. Given that the vector dimension generated by the text-embedding-v2 model is 1536, dense_dim is set to 1536.


This example uses the latest capabilities of Milvus 2.5. By creating a bm25_function object, Milvus can automatically convert text columns into sparse vectors.
Similarly, when processing Chinese documents, Milvus 2.5 also supports specifying the corresponding Chinese analyzer.
Important: After completing the Analyzer settings in the Schema, the settings will be permanently effective for the Collections. If you need to set a new Analyzer, you must recreate the Collection.
# Define tokenizer parametersanalyzer_params = {"type" : "chinese" #Specify the tokenizer type as Chinese}
# Add a text field to the Schema and enable the tokenizerschema.add_field(field_name = "text" , # field namedatatype=DataType.VARCHAR, # Data type: string (VARCHAR)max_length=65535, # Maximum length: 65535 charactersenable_analyzer=True, # Enable word breakeranalyzer_params=analyzer_params # Tokenizer parameters)

Step 3: Full-text search

In Milvus 2.5, you can easily use the latest full-text search capabilities through the relevant APIs. The code example is shown below.
from pymilvus import MilvusClient
# Create Milvus Client.client = MilvusClient(uri = "http://c-xxxx.milvus.aliyuncs.com:19530" , # The public network address of the Milvus instance.token= "<yourUsername>:<yourPassword>" , # Username and password for logging into the Milvus instance.db_name = "default" # The name of the database to be connected. The example in this article uses the default.)
search_params = {'params' : { 'drop_ratio_search' : 0.2 },}
full_text_search_res = client.search(collection_name= 'milvus_overview' ,data=[ 'what makes milvus so fast?' ],anns_field = 'sparse_bm25' ,limit = 3 ,search_params=search_params,output_fields = [ "text" ],)
for hits in full_text_search_res:for hit in hits:print(hit)print( "\n" )
"""{'id': 456165042536597485, 'distance': 6.128782272338867, 'entity': {'text': '## What Makes Milvus so Fast? \n\nMilvus was designed from day one to be a highly efficient vector database system. In most cases, Milvus outperforms other vector databases by 2-5x (see the VectorDBBench results). This high performance is the result of several key design decisions:\n\n**Hardware-aware Optimization**: To accommodate Milvus in various hardware environments, we have optimized its performance specifically for many hardware architectures and platforms, including AVX512, SIMD, GPUs, and NVMe SSD.\n\n**Advanced Search Algorithms**: Milvus supports a wide range of in-memory and on-disk indexing/search algorithms, including IVF, HNSW, DiskANN, and more, all of which have been deeply optimized. Compared to popular implementations like FAISS and HNSWLib, Milvus delivers 30%-70% better performance.'}}
{'id': 456165042536597487, 'distance': 4.760214805603027, 'entity': {'text': "## What Makes Milvus so Scalable\n\nIn 2022, Milvus supported billion-scale vectors, and in 2023, it scaled up to tens of billions with consistent stability, powering large-scale scenarios for over 300 major enterprises, including Salesforce, PayPal, Shopee, Airbnb, eBay, NVIDIA, IBM, AT&T, LINE, ROBLOX, Inflection, etc.\n\nMilvus's cloud-native and highly decoupled system architecture ensures that the system can continuously expand as data grows:\n\n![Highly decoupled system architecture of Milvus](../../../assets/highly-decoupled-architecture.png)"}}"""

Step 4: Keyword matching

Keyword matching is a new feature provided by Milvus 2.5, which can be combined with vector similarity search to narrow the search scope and improve search performance. If you want to use the keyword search function, you need to set both enable_analyzer and enable_match to True when defining the mode.
Important: Enabling enable_match will create an inverted index for the field, which will consume additional storage resources.

Example 1: Keyword matching combined with vector search

In this code example snippet, we use a filter expression to restrict the search results to only documents that match the specified terms "query" and "node." The vector similarity search is then performed on the filtered subset of documents.
filter = "TEXT_MATCH(text, 'query') and TEXT_MATCH(text, 'node')"
text_match_res = client.search(collection_name= "milvus_overview" ,anns_field = "dense" ,data=query_embeddings,filter=filter,search_params={ "params" : { "nprobe" : 10}},limit=2,output_fields = [ "text" ])
Example 2: Scalar filter query

Keyword matching can also be used for scalar filtering in query operations. By specifying a TEXT_MATCH expression in query(), you can retrieve documents that match a given term. In this code example snippet, the filter expression limits the search results to only documents that match "scalable" or "fast".
filter = "TEXT_MATCH(text, 'scalable fast')"
text_match_res = client.query(collection_name= "milvus_overview" ,filter=filter,output_fields = [ "text" ])

Step 5: Hybrid search and RAG

Combining vector search and full-text retrieval, the RRF (Reciprocal Rank Fusion) algorithm is used to fuse vector and text retrieval results, re-optimize sorting and weight distribution, and improve data recall and accuracy.
The code example is shown below.
from pymilvus import MilvusClientfrom pymilvus import AnnSearchRequest, RRFRankerfrom langchain_community.embeddings import DashScopeEmbeddingsfrom dashscope import Generation
# Create Milvus Client.client = MilvusClient(uri = "http://c-xxxx.milvus.aliyuncs.com:19530" , # The public network address of the Milvus instance.token= "<yourUsername>:<yourPassword>" , # Username and password for logging into the Milvus instance.db_name = "default" # The name of the database to be connected. The example in this article uses the default.)
collection_name = "milvus_overview"
# Replace with your DashScope API-KEYdashscope_api_key = "<YOUR_DASHSCOPE_API_KEY>"
# Initialize the Embedding modelembeddings = DashScopeEmbeddings(model = "text-embedding-v2" , # Use the text-embedding-v2 model.dashscope_api_key=dashscope_api_key)
# Define the queryquery = "Why does Milvus run so scalable?"
# Embed the query and generate the corresponding vector representationquery_embeddings = embeddings.embed_documents([query])
# Set the top K result counttop_k = 5 # Get the top 5 docs related to the query
# Define the parameters for the dense vector searchsearch_params_dense = {"metric_type" : "IP" ,"params" : { "nprobe" : 2 }}
# Create a dense vector search requestrequest_dense = AnnSearchRequest([query_embeddings[ 0 ]], "dense" , search_params_dense, limit=top_k)
# Define the parameters for the BM25 text searchsearch_params_bm25 = {"metric_type" : "BM25"}
# Create a BM25 text search requestrequest_bm25 = AnnSearchRequest([query], "sparse_bm25" , search_params_bm25, limit=top_k)
# Combine the two requestsreqs = [request_dense, request_bm25]
# Initialize the RRF ranking algorithmranker = RRFRanker( 100 )
# Perform the hybrid searchhybrid_search_res = client.hybrid_search(collection_name=collection_name,reqs=reqs,ranker=ranker,limit=top_k,output_fields = [ "text" ])
# Extract the context from hybrid search resultscontext = []print( "Top K Results: " )for hits in hybrid_search_res: # Use the correct variable herefor hit in hits:context.append(hit[ 'entity' ][ 'text' ]) # Extract text content to the context listprint(hit[ 'entity' ][ 'text' ]) # Output each retrieved document

# Define a function to get an answer based on the query and contextdef getAnswer (query, context) :prompt = f'''Please answer my question based on the content within:```{context}```My question is: {query} .'''# Call the generation module to get an answerrsp = Generation.call(model= 'qwen-turbo' , prompt=prompt)return rsp.output.text
# Get the answeranswer = getAnswer(query, context)
print(answer)

# Expected output excerpt"""Milvus is highly scalable due to its cloud-native and highly decoupled system architecture. This architecture allows the system to continuously expand as data grows. Additionally, Milvus supports three deployment modes that cover a wide..."""