Milvus Lite Quick Practice - Understand the mechanism behind RAG implementation

Written by
Iris Vance
Updated on:July-13th-2025
Recommendation

Milvus Lite makes RAG system implementation easier and helps you quickly master the practical skills of vector databases.

Core content:
1. Milvus Lite and its application value in RAG system
2. Milvus Lite installation and deployment conditions
3. Detailed steps for text vectorization and creating Collections

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Milvus Lite is a lightweight version of Milvus, an open source vector database that provides support for artificial intelligence applications through vector embedding and similarity search. The most typical application scenario is RAG (Retrieval-Augmented Generation), which provides powerful vector storage and retrieval capabilities for the RAG system. Through the following practice, you can understand the general process of text vectorization and similarity matching (semantic matching), and understand the mechanism behind the implementation of RAG.

Install milvus lite

Milvus Lite is included in the Milvus Python SDK. It can be easily deployed via pip install pymilvus.

Deployment prerequisites:

  • Python 3.8+;
  • Ubuntu >= 20.04 (x86_64 or arm64)
  • MacOS >= 11.0 (Apple M1/M2 or x86_64)

The installation command is as follows:

pip install -U pymilvus

Text vectorization

Creating a vector database

By instantiationMilvusClient , specify a file name to store all data to create a local Milvus vector database. For example:

from  pymilvus  import  MilvusClient

client = MilvusClient( "milvus_demo.db" )

Creating Collections

Collections are similar to tables in traditional SQL databases.Collections Used to store vectors and their related metadata. When creating Collections, you can define schema and index parameters to configure vector specifications, such as dimensions, index types, and long-distance metrics.

When creating Collections, you need to set at least the name and the dimensions of the vector field. Unspecified parameters use default values.

if  client.has_collection(collection_name= "demo_collection" ):
    client.drop_collection(collection_name= "demo_collection" )
client.create_collection(
    collection_name= "demo_collection" ,
    dimension= 768 ,   # The vector size in this demo is 768 dimensions
)

Text vectorization

download Embedding Model Generate vectors for the test text.

First, install the model library, which contains basic ML tools such as PyTorch. If your local environment has never installed PyTorch, the package download may take some time.

pip install  "pymilvus[model]"

Generate vector embeddings using the default model. Milvus expects data to be inserted in the form of a list of dictionaries, each dictionary representing a data record, calledentity.

from  pymilvus  import  model

# If accessing https://huggingface.co/ fails, uncomment the following path
# import os
# os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

# This will download a smaller embedding model "paraphrase-albert-small-v2" (~50MB).
embedding_fn = model.DefaultEmbeddingFunction()

# The text string to search for
docs = [
    "Artificial intelligence was founded as an academic discipline in 1956." ,
    "Alan Turing was the first person to conduct substantial research in AI." ,
    "Born in Maida Vale, London, Turing was raised in southern England." ,
]

vectors = embedding_fn.encode_documents(docs)

# The output vector has 768 dimensions, matching the created set.
print( "Dim:" , embedding_fn.dim, vectors[ 0 ].shape)   # Dim: 768 (768,)


# Each entity has an id, vector representation, raw text, and topic tags to demonstrate metadata filtering.
data = [
    { "id" : i,  "vector" : vectors[i],  "text" : docs[i],  "subject""history" }
    for  i  in  range(len(vectors))
]

print( "Data has" , len(data),  "entities, each with fields: " , data[ 0 ].keys())
print( "Vector dim:" , len(data[ 0 ][ "vector" ]))

Output:

Dim:  768  ( 768 ,)
Data has  3  entities, each  with  fields: dict_keys([ 'id''vector''text''subject' ])
Vector dim:  768

Inserting Data

Let's insert the data into Collections:

res = client.insert(collection_name= "demo_collection" , data=data)

print(res)

Output:

{ 'insert_count' : 3,  'ids' : [0, 1, 2],  'cost' : 0}

Semantic Search

Now we can perform semantic search by representing search query text as vectors and perform vector similarity search on Milvus.

Vector Search

Milvus can accept one or more vector search requests at the same time.query_vectors The value of the variable is a list of vectors, where each vector is an array of floating point numbers.

query_vectors = embedding_fn.encode_queries([ "Who is Alan Turing?" ])

res = client.search(
    collection_name = "demo_collection" ,   # target collection
    data=query_vectors,   # query vectors query vectors
    limit= 2 ,   # The number of entities returned
    output_fields=[ "text""subject" ],   # Specify the entity fields returned by the query
)

print(res)

Output:

data: [ "[{'id': 2, 'distance': 0.5859944820404053, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}}, {'id': 1, 'distance': 0.5118255615234375, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]" ] , extra_info: { 'cost' : 0}

The output is a list of results, each of which maps to a vector search query. Each query contains a list of results, where each result contains the entity primary key, the distance to the query vector, and the specifiedoutput_fields Entity details for .

The distance value can be used to evaluate the relevance of search results. The closer it is to 0, the more relevant the search results are. For more information about similarity metrics, see Similarity Metrics .

Vector search with metadata filtering

You can also do vector searches while taking into account metadata values ​​(called "scalar" fields in Milvus, because scalar refers to non-vector data). This can be achieved by specifying filter expressions that specify specific conditions. Let's see how to usesubject fields to search and filter.

# Insert more documents into another topic.
docs = [
    "Machine learning has been used for drug design." ,
    "Computational synthesis with AI algorithms predicts molecular properties." ,
    "DDR1 is involved in cancers and fibrosis." ,
]
vectors = embedding_fn.encode_documents(docs)
data = [
    { "id"3  + i,  "vector" : vectors[i],  "text" : docs[i],  "subject""biology" }
    for  i  in  range(len(vectors))
]

client.insert(collection_name= "demo_collection" , data=data)

# Filter subject by filter. This will exclude any text with the subject "history", even if it is very close to the query vector.
res = client.search(
    collection_name= "demo_collection" ,
    data=embedding_fn.encode_queries([ "tell me AI related information" ]),
    filter= "subject == 'biology'" ,
    limit= 2 ,
    output_fields=[ "text""subject" ],
)

print(res)

Output:

data: [ "[{'id': 4, 'distance': 0.27030569314956665, 'entity': {'text': 'Computational synthesis with AI algorithms predicts molecular properties.', 'subject': 'biology'}}, {'id': 3, 'distance': 0.16425910592079163, 'entity': {'text': 'Machine learning has been used for drug design.', 'subject': 'biology'}}]" ] , extra_info: { 'cost' : 0}

By default, scalar fields are not indexed. If you need to perform metadata filtering searches on large datasets, consider using a fixed schema and turning on indexing to improve search performance.

In addition to vector searches, other types of searches can be performed.

Query

Query() is an operator used to retrieve all entities matching a certain condition (such as a filter expression or matching some id).

For example, to retrieve all entities where a scalar field has a specific value:

res = client.query(
    collection_name= "demo_collection" ,
    filter= "subject == 'history'" ,
    output_fields=[ "text""subject" ],
)

Retrieve entities directly by primary key

res = client.query(
    collection_name= "demo_collection" ,
    ids=[ 02 ],
    output_fields=[ "vector""text""subject" ],
)

Deleting an Entity

If you want to clear data, you can delete entities with a specified primary key, or delete all entities that match a specific filter expression.

# Delete an entity using its primary key.
res = client.delete(collection_name= "demo_collection" , ids=[ 02 ])

print(res)

# Use the filter expression to remove all entities whose subject is "biology".
res = client.delete(
    collection_name= "demo_collection" ,
    filter= "subject == 'biology'" ,
)

print(res)

Output:

[0, 2]
[3, 4, 5]

Loading existing data

Since all data of Milvus Lite is stored in local files, you can restore it even after the program is terminated by creating aMilvusClient , load all data into memory. For example, this will restore the Collections in the "milvus_demo.db" file and continue writing data to it.

from  pymilvus  import  MilvusClient

client = MilvusClient( "milvus_demo.db" )

Deleting Collections

If you want to delete all the data in a Collection, you can discard the Collection using the following method:

# Deleting Collections
client.drop_collection(collection_name= "demo_collection" )

References

  1. https://milvus.io/docs/zh/quickstart.md

  2. https://milvus.io/docs/en/milvus_lite.md


Complete example code:
from pymilvus import modelfrom pymilvus import MilvusClient# 1. Create vector database and collection# By instantiating `MilvusClient`, specify a file name to store all dataclient = MilvusClient("milvus_demo.db")# Create Collectionsif client.has_collection(collection_name="demo_collection"): client.drop_collection(collection_name="demo_collection")client.create_collection( collection_name="demo_collection", dimension=768, # The vector specification in this demo is 768 dimensions)# 2. Text vectorization# If accessing https://huggingface.co/ fails, uncomment the following path# import os# os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'# This will download a smaller embedding model "paraphrase-albert-small-v2" (~50MB). embedding_fn = model.DefaultEmbeddingFunction() # Text string to search for docs = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England.",] vectors = embedding_fn.encode_documents(docs) # The output vector has 768 dimensions, matching the collection created. print("Dim:", embedding_fn.dim, vectors[0].shape) # Dim: 768 (768,) # Each entity has an id, vector representation, original text, and topic tags, to demonstrate metadata filtering data = [ {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"} for i in range(len(vectors))]print("Data has", len(data), "entities, each with fields: ", data[0].keys())print("Vector dim:", len(data[0]["vector"]))# Insert data into the collectionres = client.insert(collection_name="demo_collection", data=data)print("----------------------Insert data into the collection-----------------------------")print(res)# Three, Vector Searchprint("-------------------Vector Search-----------------------------------------")query_vectors = embedding_fn.encode_queries(["Who is Alan Turing?"])res = client.search( collection_name="demo_collection", # Target collection data=query_vectors, # query vectors Query vector limit=2, # # Filter subject by filter output_fields = ["text", "subject"], # Specify the entity fields returned by the query)print(res)# Search with metadata filteringdocs = [ "Machine learning has been used for drug design.", "Computational synthesis with AI algorithms predicts molecular properties.", "DDR1 is involved in cancers and fibrosis.",]vectors = embedding_fn.encode_documents(docs)data = [ {"id": 3 + i, "vector": vectors[i], "text": docs[i], "subject": "biology"} for i in range(len(vectors))]client.insert(collection_name="demo_collection", data=data)# Filter subject by filter. This will exclude any text with the subject "history", despite being very close to the query vector. print("----------------Vector search, print search results for subject biology----------------------")res = client.search( collection_name="demo_collection", data=embedding_fn.encode_queries(["tell me AI related information"]), filter="subject == 'biology'", limit=2, output_fields=["text", "subject"],)print(res)# 4. Query# Retrieve all entities whose scalar field has a specific value:print("--------------Query, retrieve all entities whose scalar field has a specific value of history:---------------")res = client.query( collection_name="demo_collection", filter="subject == 'history'", output_fields=["text", "subject"],)print(res)# Retrieve entities by primary key value:print("--------------------Query, retrieve entities by primary key value:--------------------------------")res = client.query( collection_name="demo_collection", ids=[0, 2], output_fields=["vector", "text", "subject"],)print(res)# Delete entities using primary key. print("-------------------------- Use primary key to delete entities: ------------------------------")res = client.delete(collection_name="demo_collection", ids=[0, 2])print(res)# Use filter expression to delete all entities whose subject is "biology". print("-----------filter expression deletes all entities whose subject is biology. :----------------")res = client.delete( collection_name="demo_collection", filter="subject == 'biology'",)print(res)# 5. Delete collectionclient.drop_collection(collection_name="demo_collection")