Can DeepSeek+RAG continue to be done?

Written by
Clara Bennett
Updated on:July-08th-2025
Recommendation

Explore the potential of combining RAG and DeepSeek, and gain in-depth understanding of technical adaptability and practical cases.

Core content:
1. RAG technical logic and its challenges, as well as the advantages and limitations of DeepSeek
2. The current status and optimization direction of combining DeepSeek and RAG
3. Experimental case analysis in the legal field to explore technical adaptability

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Sparks of Collision between RAG and DeepSeek

In recent years, Retrieval-Augmented Generation (RAG) has become a popular technology in large language model (LLM) applications. It makes up for the knowledge limitations of the model by combining with external knowledge bases. As an emerging model with strong reasoning ability, DeepSeek seems to be a promising direction when combined with RAG. However, can this combination really continue? Is it the icing on the cake? This article will take you to explore it from three dimensions: the current state of technology, practical cases, and future trends.

1. Technical Status: Compatibility Analysis of DeepSeek and RAG

  1. The basic logic and challenges of RAG The core of RAG is to combine retrieval and generation: first recall relevant documents from the knowledge base, and then process and output them through the generation model. This method performs well in scenarios that require high accuracy and traceability (such as law and medicine). But the challenge is that the accuracy (recall rate and precision rate) of the retrieval stage directly determines the quality of generation, while the reasoning ability of the generation stage determines the logic and practicality of the final answer.
  1. DeepSeek's strengths and weaknesses DeepSeek (especially R1) is known for its strong reasoning ability, and can generate logically clear and less hallucinatory answers through "chain-of-thought". However, it is not a panacea. DeepSeek performs poorly in generating embeddings, and its divergent thinking leads to slow speed and low matching with vector databases. In contrast, specialized embedding models (such as Qwen2) are more suitable for retrieval tasks.
  1. The current situation and optimization direction of the combination At present, the practice of DeepSeek+RAG mostly adopts a division of labor and cooperation mode: using embedded models such as Qwen2 for retrieval, and DeepSeek for generation. This division of labor seems reasonable, but it also exposes some problems - if the retrieval stage is too conservative, DeepSeek's reasoning ability may be useless; if the retrieval is too divergent, it may introduce noise and affect the quality of generation. Therefore, how to balance the complexity of retrieval and generation is the key to the implementation of the technology.

2. Practical case analysis: DeepSeek+RAG experiment in the legal field

To more intuitively understand the effect of this combination, let’s look at a specific experiment: the SkyPilot team’s RAG attempt in the legal field.

  1. Experimental design
  • Dataset : We use a subset of the Pile-of-Law dataset, focusing on legal advice.
  • Technology stack : ChromaDB as vector storage, Qwen2 embedding for retrieval, DeepSeek R1 for generation, vLLM and SkyPilot for performance optimization.
  • Goal : Provide accurate, traceable answers to complex legal questions.

Two models are used to generate embeddings for the dataset and form two vector databases. Then, we use the same query for both models and find the top 5 most similar embeddings in the vector databases generated by the corresponding models.

In the table above, the search results of DeepSeek R1 are obviously worse. Why is that?

We believe the fundamental problem lies in the way DeepSeek-R1 is trained. DeepSeek-R1 is primarily designed as an inference engine, focusing on sequential thinking and logical connections. This means that DeepSeek-R1 does not map documents into a semantic space.

In contrast, the Qwen2 model variant (gte-Qwen2-7B-instruct) is trained specifically for the semantic similarity task, creating a high-dimensional space where conceptually similar documents are closely clustered together, regardless of the specific wording.

This difference in training procedures means that Qwen excels at capturing the intent behind a query, whereas DeepSeek-R1 sometimes follows reasoning paths that lead to topically relevant but actually irrelevant results.

Unless DeepSeek-R1 is fine-tuned for embeddings, it should not be used as a retrieval embedding model for RAG.

  1. Key Findings
  • Don’t use DeepSeek for retrieval : The experiment compared the embedding effects of DeepSeek R1 and Qwen2. The results showed that DeepSeek’s retrieval results deviated from the topic (e.g. the query “small claims court preparation” returned “is it legal to release dogs”). The reason is that its reasoning-oriented training logic is not suitable for semantic space mapping.
  • DeepSeek is very powerful in generation : In the generation phase, DeepSeek R1 can clearly cite documents and has strict logic. For example, for the question "How to prepare for small claims court", it can extract key points from multiple documents and generate structured answers.
  • Engineering optimization is essential : carefully designed prompts are crucial to reducing illusions and increasing citation rates; document chunking and parallel computing (such as vLLM speeding up by 5.5 times) also significantly improve efficiency.
  1. This experiment shows that DeepSeek+RAG does have potential in scenarios that require reasoning and traceability, but the key to success lies in "strengthening strengths and avoiding weaknesses" - letting professional embedding models be responsible for retrieval, DeepSeek focusing on generation, and supplemented by engineering optimization.

3. Future Development: Potential and Boundaries of DeepSeek+RAG

  1. Where is the boundary between reasoning and retrieval? The current trend is to put more reasoning capabilities (Think) in the generation phase rather than the retrieval phase. For example, O1 Embedder attempts to add a "thinking" step to the embedding model, but the effect is limited and the speed is slow. In contrast, DeepSeek R1 can achieve information screening and synthesis on complex problems through recursive reasoning during generation (see [r1-reasoning-rag]), which may be a more efficient path.

  2. An interesting phenomenon in the transformation of user experience is that users are becoming more accepting of "test-time computing". As https://mp.weixin.qq.com/s/-pPhHDi2nz8hp5R3Lm_mww said, people are willing to wait longer for high-quality results. This concept of "delayed gratification" may open up new space for the complex reasoning model of DeepSeek+RAG, especially in professional fields.

  3. The direction of technological evolution

  • Model fine-tuning : By fine-tuning DeepSeek for the embedding task.
  • Agent-based trend : Integrating DeepSeek's reasoning capabilities into the Agent framework is more flexible than simple RAG.

Conclusion: You can do it, but you have to choose the right scenario

DeepSeek+RAG is not a panacea, but it is not a dead end. It still has room for development in tasks that require strong reasoning and high traceability (such as legal consulting). The key is:

  • Clear division of labor : retrieval is left to professional embedding models, and generation is left to DeepSeek.
  • Scenario adaptation : Avoid using it for simple tasks that are speed-sensitive, and focus on complex reasoning scenarios.
  • Continuous optimization : Improve overall performance through prompt engineering, document processing and hardware acceleration.

In the future, as model capabilities improve and user needs change, the combination of DeepSeek+RAG may find a broader stage. What other possibilities do you think this technology has? Welcome to leave a message to discuss!

Quote:

https://blog.skypilot.co/deepseek-rag/ https://arxiv.org/pdf/2502.07555 https://github.com/deansaco/r1-reasoning-rag

{
  "target" : "Get to know me briefly" ,
  "selfInfo" : {
       "genInfo" : "Interviewer for a large company, studying for a master's degree in artificial intelligence at the Institute of Automation, Chinese Academy of Sciences, engaged in data closed-loop business, RAG, Agent, etc., and plays a comprehensive role in technology + platform. Good at research, summarization and planning, good at coordination and collaboration, likes technology, and likes to read articles and papers on new technologies and products" ,
       "contactInfo" : "abc061200x, v-adding disabled" ,
       "slogan" : "Simple, efficient, do the right thing" ,
       "extInfo" : "I like watching movies, traveling, outdoor hiking, reading and studying. I don't smoke or drink, and I don't have any bad habits."
  } 
}