Solving the integration problem of Dify and Milvus: a practical guide to avoid pitfalls from zero to one

Master the integration skills of Dify and Milvus to improve big data processing capabilities.
Core content:
1. WSL Linux deployment process of the stand-alone Milvus
2. Milvus configuration modification and service startup verification
3. Dify deployment on WSL Linux and basic environment configuration
1. Guide to deploying stand-alone Milvus on WSL Linux
1. Environment preparation and hardware verification
Hardware requirements (at least one of the following must be met):
Software Dependencies :
Docker 19.03+ and Docker Compose 1.25.1+
2. Download Milvus installation yml file
# 1. Download the official deployment script
$ wget https://github.com/milvus-io/milvus/releases/download/v2.5.6/milvus-standalone-docker-compose.yml -O docker-compose.yml
3. Modify the docker-compose.yml configuration
services:
etcd:
restart: always #Ensure that Docker will automatically start after restart
....
minio:
restart: always #Ensure that Docker will automatically start after restart
ports:
- "19001:9001" # Ensure that there will be no Minio port conflict when installing RAGflow later
- "19000:9000" #
....
standalone:
restart: always
....
4. Modify milvus.yaml in the container
# Enter the Milvus container (replace CONTAINER_ID)
docker exec -it milvus-standalone /bin/bash
# Enable authentication
sed -i 's/authorizationEnabled: false/authorizationEnabled: true/g' /milvus/configs/milvus.yaml
docker exec -it milvus-standalone cat /milvus/configs/milvus.yaml | grep authorizationEnabled
Displays as: true
5. Start the service and verify the service status
docker-compose up -d
Milvus visual interface ATTU checks whether the connection is normal (Windows installation)
https://github.com/zilliztech/attu.git
2. Guide to deploying dify on WSL Linux
1. Basic environment configuration
# Step 1. Clone the repository (domestic users are recommended to use the mirror source)
git clone https://github.com/langgenius/dify.git
# Step 2. Configure the env environment variable
cd dify/docker
cp .env.example .env
sudo vim .env
---------------------------------------------
# The type of vector store to use.
# VECTOR_STORE=weaviate # Comment out the default vector library configuration
VECTOR_STORE=milvus
# The milvus uri.
MILVUS_URI=http://172.18.0.1:19530
MILVUS_TOKEN=
MILVUS_USER=your_user
MILVUS_PASSWORD=your_pass
MILVUS_ENABLE_HYBRID_SEARCH=True
--------------------------------------------
Step 3: Modify the docker-compose.yaml configuration
# Comment out Dify's Milvus configuration to avoid duplicate downloads and conflicts with the already installed Milvus
Milvus vector database services
etcd:
container_name: milvus-etcd
....
minio:
container_name: milvus-minio
....
milvus-standalone:
container_name: milvus-standalone
....
2. Startup and Integration
docker-compose up -d
# Here, redis, Postgre,
The result is shown in the figure above, indicating that Milvus+Dify is initially configured successfully.
3. Start Dify → Create Knowledge Base
-- In Attu, you can see the corresponding collection generated, indicating that the deployment and integration of Milvus+Dify has been successful
Tips to avoid pitfalls :
Port conflict : If port 8080 is occupied, you need to modify it .env
InNGINX_PORT
andEXPOSE_NGINX_PORT
Vector library connection failed : Check whether Milvus port 19530 is open ( telnet 127.0.0.1 19530
)GPU support : For GPU acceleration, install NVIDIA Container Toolkit and docker-compose.yml
Add todeploy.resources.reservations.devices
Configuration
3. Typical Problem Solution Library
docker logs milvus-standalone 2. Verify CPU instruction set support 3. View /var/lib/milvus/logs | ||
chmod -R 777 ./storage | ||
similarity_score_threshold To 0.75-0.85 range | ||
docker stats 2. Analyze OOM Killer log 3. Check thread deadlock | docker-compose.yml Configuring memory limits in |
4. Performance Optimization Suggestions
Caching strategy : Configuring Redis L2 cache for high-frequency queries Batch processing : Enable for large batches of documents batch_size=500
Parameters to reduce IO overheadHardware acceleration : Use GPUs that support Tensor Cores (such as T4/A10) to run BGE-M3 vector models Cluster deployment : When the data volume exceeds 100 million, it is recommended to use Milvus distributed cluster (requires Kubernetes environment)