Woter AI detection.Hurry - ends Jul 8th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

From deployment to migration, how to use Milvus well, this is our experience summary

Written by

Iris Vance

Updated on:June-28th-2025

Today I checked the number of stars of Milvus, it is close to 35,000, reaching a new milestone.

Looking back, from 0 to 100, from 100 to 10,000, and to now, almost six years have passed. During this period, we have released several important versions and also welcomed the use and feedback of more developers from all over the world.

During this period, countless questions came from the comments, issues, and meetups - there were technical bugs, product usage questions, and many friends simply cared about and supported us. Behind many seemingly "repetitive" questions, we extracted three key words:

deploy
performance
Data Management

These are the things we need to do better in the next iteration of our product, and they are also the problems that many developers are troubled by today. Today, we are here to provide a systematic Q&A.

01 Which version should I choose?

Deployment is the starting point of the entire system design. The most frequently asked questions in our community are:

“Which version should I choose?”

Should I choose the lightweight mode like Milvus Lite or the local standalone version of Standalone? Should I go directly to Distributed to build a cluster? Is Zilliz Cloud really worth hosting?

Lite is easy to use, but not "complete" enough; Standalone is "autonomous" enough, but has limited scalability; Distributed is powerful, but has a high cost and many adjustable parameters.

Therefore, we started to talk about usage scenarios instead of just features in our documentation.

For the first time, we also clearly distinguished:

Lite is used for small team prototyping. Milvus Lite can be used when storing millions of vectors in a local system (for prototyping) or looking for an embedded vector database for unit testing and CI/CD. It should be noted that Milvus Lite does not yet have advanced features such as full-text search, but it will be launched soon.
Standalone is used for production deployments at the million to billion level (image search, product retrieval). If your system needs to provide services for production traffic, or you need to store millions to hundreds of millions of vectors, you should use Milvus Standalone, which packages all components of Milvus into a Docker image. There is also a standalone mode deployed with docker compose, which can deploy persistent storage (minio) and metadata storage (etcd) as separate images.
Distributed is recommended for scenarios with hundreds of millions of vectors or thousands of QPS . Milvus Distributed can be used for any large-scale deployment serving production traffic (such as large knowledge base retrieval, recommendation system). If some users want to perform offline batch processing of massive data (such as deduplication), the upcoming Milvus 3.0 will solve this problem through vector lakes.
If you don't want to configure a cluster, just go to Zilliz Cloud. For developers who want to focus on application development without worrying about DevOps, Zilliz Cloud provides fully managed Milvus.

Going further, the essence of deployment is judgment, judging what your current challenges are and how much you are willing to pay for future resilience.

02 How much memory, storage, and computing power do I need?

Another common question in the community is:

“How much memory and CPU should I allocate to Milvus?”

Those who are concerned about this issue are not only existing Milvus users, but also those who are considering whether Milvus is suitable for their business scenarios.

But how much memory, storage, and computing power is required for deployment depends on the interaction of various complex factors: the dimensions of the vector embeddings vary depending on the model used. Some vector indexes are stored entirely in memory, while others store data on disk. In addition, many indexes actually store compressed (quantized) copies of the embeddings.

Of course, technically, there are solutions to these problems. We have made a tool for automatically estimating resources https://milvus.io/tools/sizing . You only need to enter information such as vector dimension, number of entries and index type, deployment options, etc. to estimate the CPU, memory, and storage space required for different types of Milvus nodes and their dependent components.

The specific situation varies from person to person, so it is best for everyone to use their own data to test the effect in a test environment in advance.

03 Which vector indexing should I choose?

Of all the questions we receive, “Which vector index should I choose?” is one of the most common.

After years of deep research on vector indexes, we know that we need to understand each index clearly. For countless ordinary developers, each index is a black box that is difficult to observe and compare.

But we always believe that "early start > slow fine tuning". Therefore, Milvus provides AutoIndex . Its original intention is to let the machine intelligence help you make decisions and run things when you "don't want to choose", "don't know who to choose" or "the selection process is too cumbersome", so that you can focus on more critical areas, such as whether there are problems with the data itself and whether the embedding model is effective.

Sometimes, “low recall” is not a problem with the index, but rather that the model accuracy is not high enough, or the embedding is wrong from the beginning.

However, if you want to choose a specific index type, there are some experiences to choose from. We are used to dividing indexes into "memory type", "disk type" and "GPU type". Next, let's talk about these three types.

About memory index: It has the fastest retrieval speed, but has high memory cost. Milvus currently supports all common indexes such as IVF_FLAT and HNSW. In addition, most indexes quantize vectors to reduce memory usage, but require memory to store additional data structures. Other non-vector data (scalars) and their indexes also take up memory space.

About disk indexes: If we need to process billions of vectors and don’t have massive memory , we can use DiskANN and MMap . DiskANN can place uncompressed vectors + graph search structures on disk and only maintain highly compressed copies in memory. Of course, "low latency" has a prerequisite - you need to use an NVMe hard drive (after all, SATA's performance will make you doubt your life) and MMap uses a virtual memory mechanism to swap indexes between disk and memory as needed. In this way, if only a small part of the data is used each time, the complete index can be loaded, but frequent page swaps will cause excessive latency. Many users who do log playback and long-tail analysis prefer this "on-demand loading" method.

About GPU indexing: GPUs have the advantages of parallelism , multithreading , and high throughput, but their disadvantages are also obvious: complex scheduling, higher costs, and more difficult code links to maintain. Milvus-supported GPU indexing is provided by the NVIDIA RAPIDS team and can run at a lower latency than the CPU in high-concurrency scenarios. However, it is only more cost-effective when your query volume is as large as hundreds or thousands, "squeezing the GPU to its full potential." After all, GPU memory is usually smaller than CPU RAM, and the operating cost is also higher.

04 How to choose distance metric and embedding model

An interesting phenomenon about distance metrics is that many users ask after choosing an index: “Should I use L2 or cosine?”

But in fact, you should have decided this when you were training the embedding model.

If you are doing text tasks, Cosine or the equivalent Inner Product (IP) is the most natural choice. If you are doing image tasks, then L2 may be more suitable.

In other words, this is not a question of "what should Milvus support", but "when training, you should ask yourself: do I want the model to be close to the direction or the position?"

Regarding various embedding model parameters, Zilliz also compiled a table in the document for your reference.

https://zilliz.com/ai-models

Since the distance metric depends on the embedding model, someone will definitely ask, how to choose the embedding model?

It's hard to give a clear answer to this question because it depends on your performance requirements and what stage you are in.

Similar to index selection, embedding models also need to balance computation, storage, and recall. Under the same conditions, embeddings with larger output dimensions require more storage space, but the recall rate may be higher. For a fixed dimension, larger embedding models usually outperform smaller embedding models in terms of recall, but at the cost of increased computation and time.

Rankings such as MTEB and OpenCompass are indeed useful for reference. However, the best models will not be listed on the rankings because they are run on standard data sets. The actual data often contains noise, industry terms, colloquial expressions, and even spelling errors.

Therefore, we generally recommend that you start with a basic model when you really don’t know how to choose, such as all-MiniLM-L6-v2. It’s not because it’s the best, but because it “just works” — it’s not very dimensional, has a controllable speed, and is resource-friendly. “I’ll first launch a model that works, and then use data to prove whether it’s worth changing.”

05 What to do if Milvus Distributed deployment fails

When many users deploy Milvus Distributed for the first time, the first problem they encounter is not performance bottlenecks or API failures, but the inability to start the system. Or to be more precise: the system starts, but "I'm not sure if it's running healthily."

These problems are normal because the essence of distributed architecture is the collaboration between complex components - latency, configuration, network, storage, indexing, data synchronization. Any slight change will cause the system to exhibit "unexpected" behavior.

But it's hard to come up with a single solution because each problem seems different from the last.

06 How to deploy Milvus on Windows?

There is another question we see a lot: "Can I run Milvus on Windows?"

First of all, running Milvus on Windows is not the "optimal choice", but sometimes what we need is not the best, because the reality may be: local verification is faster, the team only has Windows laptops, the CI environment has restricted permissions...

We believe that all the best practices that impose rules and regulations on users are laziness in product and technology. We must allow users to have a "workable" path at every node where "I am not sure whether it can run". So we simply rewrote the document "Deploy Milvus on Windows", please refer to the following

07 How to perform performance and status monitoring

After the deployment is completed, many people's questions will shift from "how to deploy" to "how well does it run". But at this time, another problem arises: "How should I view the performance (average query latency, latency distribution, query volume, memory usage, disk storage)?"

Because developers cannot see every step of the system's behavior, when faced with bugs such as system freezes, they often don't know which node is processing which piece of data, why the search is hung, whether the replica synchronization is blocked, and even worry "No, did I deploy it wrong?"

Therefore, Milvus 2.5 provides a new WebUI to answer these seemingly "simple" but actually "very user-friendly" questions. We hope that you can refresh your system status like refreshing a data chart, instead of ssh in and grep the log.

The core purpose of WebUI is to give you a possibility of "observation". It may not solve the problem immediately, but it can at least let you understand: "where is the problem" and "how it happened".

Of course, there is also a class of advanced users who will not stop at “running”, but will continue to ask:

When will a Milvus segment be sealed?
Why does memory usage fluctuate?
How are queries routed across nodes?

In order to respond to these "deeply curious people" , we have continuously improved the concept documents, architecture diagrams, compression mechanism descriptions, and even made some of the engineers' internal notes available as public reference materials.

For details, please refer to these two pages

https://milvus.io/docs/architecture_overview.md

https://milvus.io/docs/clustering-compaction.md

We will continue to improve Milvus internal documentation to make it easier to understand, and welcome any feedback or requests from everyone. We believe that the depth of understanding determines the boundaries of ability.

08 How to do the migration

Thanks to the trust of the majority of developers, more and more friends have begun to migrate from other products to Milvus, or plan to migrate from some lower versions to higher versions. The problems encountered include but are not limited to

“Can I migrate my Elasticsearch data here?”

“Are the data formats of Milvus 2.4 and 2.5 compatible?”

“Can I move my Standalone data to Zilliz Cloud?”

Many users come with old systems, historical data, and model versions. Without smooth migration, there will be no user experience and so-called upgrades. To meet this demand, we provide two core tools :

VTS (Vector Transfer Service) : allows you to migrate vector data from other platforms (Pinecone, Elasticsearch, Qdrant...) without manually re-indexing.
https://github.com/zilliztech/vts
Zilliz Cloud Managed Migration Service : helps you smoothly transition your original self-hosted deployment to a cloud environment, supporting version differences and breakpoint resumption.

https://docs.zilliz.com/docs/migrations

With these two tools, we hope that anyone who chooses Milvus and zilliz cloud will not need to give up all their previous accumulation, but instead turn it into an infrastructure that can be freely accessed and switched at any time, and you can always control the flow of data .

end

Why do we record this?

Because the voice of the community is the most authentic mirror for us to know ourselves. Milvus is the result of everyone's joint efforts. Any user who speaks out is not only a "user" but also a partner. This is a deep trust in us and a rare relationship.

Moreover, the original intention of Milvus’ design was never to be “big and comprehensive”, but “conversational” (ps: we not only hope that the community can be conversational, but Milvus will also support natural language queries in the future to make the product conversational).

We are willing to admit: it is not perfect yet. Some documents are not clear enough, some configurations are not smart enough, and some indexes are still being improved. But it has one feature:

The more you use it, the better it understands you. When you run a query for the first time, it will try to choose the index for you by default. When you want to go to production, it will expose more parameters for you to control. When you migrate data, it will try to be compatible with the old format as much as possible, reducing the cost and price of your choice.

It should be the kind of infrastructure for the AI era - the longer you use it, the more you trust it; an old friend that can communicate with you and evolve at any time.

——Milvus Community Team