The new king of open source embedding has arrived! Qwen3-Embedding local deployment guide + Dify recall test record

The Qwen3-Embedding series of models surpasses mainstream competitors in multi-language and long-context processing capabilities, becoming a new benchmark for open source embedding.
Core content:
1. Qwen3-Embedding series model evaluation performance and version introduction
2. All-round comparison of Qwen3-Embedding and BGE-M3
3. Qwen3-Embedding local deployment guide and Dify recall test record
A few days ago, the Qwen3-Embedding series of models (including three versions: 8B, 4B and 0.6B) launched by Tongyi Qianwen performed amazingly in authoritative evaluations, especially in multi-language tasks and long-context processing capabilities, surpassing mainstream competitors and becoming the new king of open source embedding models.
An all-round player in multiple sizes, it completely crushes BGE-M3!
Performance dominance, leading in all sizes
Qwen3-8B topped the list with a total score of 70.58 (surpassing Gemini-001's 68.37 ), and ranked first in 12 of the 16 evaluation items, especially in key tasks such as retrieval accuracy (MSMARCO 57.65) and question-answering ability (NQ 10.06).
Even the smallest Qwen3-0.6B (only 595M parameters), with a total score of 64.34 , still significantly surpasses 7B-level competitors (such as SFR-Mistral 60.9). Small models also have great power!
Comparison with BGE-M3: All-round generational advantage
index | Qwen3-8B | BGE-M3 | Advantage |
---|---|---|---|
Comprehensive score | 70.58 | 59.56 | ↑11.02 |
Context length | 32K | 8K | ↑ 4 times |
Retrieval task (MSMARCO) | 57.65 | 40.88 | ↑41% |
Open Questions (NQ) | 10.06 | -3.11 | Reversal of negative points |
Multi-language understanding | 28.66 | 20.10 | ↑42% |
While maintaining 99% list compliance, Qwen3 completely rewrites the performance boundary of the Embedding model with higher dimensional parameters (8B vs 568M) and 4 times the context support!
Comparison of models of the same size: performance crushes the same level
Both are at the 7B level: Qwen3-8B compares Linq-Embed-Mistral (61.47) and SFR-Mistral (60.9), with a performance lead of more than 15%.
Lightweight battlefield: Qwen3-0.6B (64.34) is far ahead of similar small models such as multilingual-e5-large (63.22) and BGE-M3 (59.56), proving the efficiency of Tongyi Qianwen architecture.
Local deployment of Qwen3-Embedding
GPUStack Local Deployment
Deploy GPUStack yourself according to the official documentation . The official Docker image is provided for quick deployment.
In the GPUStack model interface, click Deploy Model -> ModelScope and search for qwen3-embedding . The platform will automatically detect your hardware performance and recommend the quantized model version that can be installed.
We selected the Q8_0 quantized version of qwen3-embedding-8b and waited for the model to be downloaded. The message "running" indicated that the model had been deployed.
Testing in dify
Now find GPUStack in Dify's plugin market and click to install the plugin. After the plugin is installed, proceed to model configuration.
Create a knowledge base and select the model we deployed ourselves in the Embedding model.
Put the historical articles of the official account into the knowledge base for testing.
Select Dify's parent-child segmentation strategy. Since it is in markdown format, each large paragraph is expected to be a parent block, and the segmentation character is "#".
Test the recall