The "hallucination" problem of DeepSeek R1 model and the solution for enterprise-level applications

Written by
Silas Grey
Updated on:July-16th-2025
Recommendation

Explore the challenges and opportunities of DeepSeek R1 model in enterprise applications.

Core content:
1. The reasoning ability of DeepSeek R1 model and its "hallucination" phenomenon
2. The difference in hallucination rate compared with DeepSeek V3 model
3. The balance between creativity and accuracy in model training

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Since the beginning of the year, the DeepSeek R1 model has attracted much attention for its excellent reasoning ability. However, "hallucination" is still an unavoidable problem of current large language models.


People once believed that as the reasoning capabilities of large models improved, the accuracy of answers would be greatly improved, thereby reducing the "hallucination" phenomenon.


Contrary to expectations, although the DeepSeek R1 model performed impressively in reasoning ability, it showed a higher rate of hallucinations.


According to the results of the Vectara team's HHEM artificial intelligence hallucination test, the hallucination rate of DeepSeek R1 is as high as 14.3%, which is 4 times that of its general-purpose large model DeepSeek V3.


Source: https://www.vectara.com/blog/deepseek-r1-hallucinates-more-than-deepseek-v3


This phenomenon is not only found in DeepSeek, but also in other large models. According to the test of the Vectara team, OpenAI's inference model GPT o1 shows a higher hallucination rate than its general model GPT 4o.


This phenomenon shows that although the enhanced reasoning ability of DeepSeek R1 has improved the performance of the model in complex tasks, it also increases the risk of generating fabricated and specious content.


This may not be accidental.


According to analysis, the enhancement of DeepSeek R1's reasoning ability relies on the use of high-quality long chain of thought (CoT) data training . Although this helps the model to conduct in-depth reasoning in complex tasks, when dealing with simple tasks, overly long reasoning chains may be more likely to introduce bias.


If you have used DeepSeek R1, you can see from its thought chain output: even when faced with a simple instruction, R1 will not tire of understanding and extending it from different angles.


The complexity of these simple tasks may cause the model to fill in fabricated content, thus increasing the problem of "hallucination".


Secondly, the model may have given more rewards to its creativity during training. This kind of "creativity" shows unique advantages in writing and creative content generation. However, in tasks that require extremely high facts, the model is prone to "overdoing it" and the generated content deviates from the facts.


Therefore, we should not use the DeepSeek R1 inference model in all cases. For example, R1 is not suitable for generating summaries. In tasks with high factual requirements, the DeepSeek V3 general model can be used to reduce the occurrence of "hallucination" problems.


In enterprise applications, accuracy is critical.


Whether it is customer service, decision support or data analysis, or dealing with complex business problems, enterprises need reliable and accurate information.


The more serious "hallucination" problem of DeepSeek R1 reminds us that although large models show increasingly powerful understanding, generation, and reasoning capabilities, they can never be used directly and simply in enterprise-level applications.


Enterprises need a solution that can combine their own knowledge system to ensure the accuracy and reliability of the generated content. This is why large model retrieval enhanced generation (RAG) technology has become the mainstream of enterprise-level applications.


The core of RAG technology lies in the ability to combine the enterprise's local knowledge base with the big model. Based on the enterprise's internal knowledge base and database, the big model generates accurate answers through retrieval enhancement, thereby reducing the occurrence of "hallucination" problems.


Now, there are various ways to use DeepSeek R1/V3 to build RAG systems. Enterprises can choose the appropriate solution according to their own needs, and use DeepSeek's reasoning ability and the accuracy of the local knowledge base to improve the intelligence level of enterprise-level applications.


Among them, the open source project ThinkRAG provides an application architecture that can be deployed in an enterprise-level environment.


As a local knowledge base large model RAG system, ThinkRAG can run on a laptop, locally deploy large models such as DeepSeek through Ollama, and save the knowledge base data locally.


This feature not only solves the enterprise's concerns about data security, but also reduces dependence on the network and external resources through localized deployment.


https://github.com/wzdavid/ThinkRAG


The large-model RAG system has demonstrated its powerful capabilities in many enterprise-level application scenarios.


For example, corporate employees upload internal business process documents to form the company's knowledge base. When employees handle customer inquiries, they only need to enter questions, and the system can retrieve relevant documents from the knowledge base within a few seconds, generate accurate answers, and provide reference materials. This efficient knowledge retrieval and generation capability not only improves employee work efficiency, but also ensures the accuracy and professionalism of external communication.


Internal knowledge management and training is another application scenario. By importing the company's technical documents, training materials and industry standards into the knowledge base, new employees can quickly obtain the required information through simple question-and-answer format, which speeds up the efficiency of onboarding training.


We know that "enterprise-level" usually also means: locally deployable and customizable. 


Systems like ThinkRAG not only provide a technical framework, but also a solution tailored for enterprises. The system supports a variety of large models, and users can choose the appropriate model according to the specific needs of the enterprise. For example, for scenarios that require efficient reasoning capabilities, DeepSeek R1 can be selected.


Efficient local deployment capabilities are also critical.


Through tools such as Ollama, enterprises can download large models to run locally without relying on external networks. This deployment method not only improves system security, but also reduces operating costs and ensures system stability and availability.


With the rapid development of AI technology, large model retrieval enhanced generation (RAG) technology is also continuing to evolve.


The enterprise-level multimodal RAG system will be able to process various forms of unstructured data such as documents, images, and videos, support the construction of a multimodal knowledge base, and realize multimodal fusion retrieval, thereby generating answers or reports containing rich content such as images and tables.


At the same time, based on the automatic construction of the knowledge graph , the system can further improve its reasoning ability and the accuracy of answers.


By introducing intelligent agent technology , the system can handle more complex tasks, such as automatically calling external tools and data sources, and coordinating with the company's existing OA, CRM, ERP and other system functions to complete complex enterprise-level tasks.


Finally, I would like to say that although the "hallucination" problem of large models brings challenges, it also prompts us to comprehensively use various technologies and components to continuously explore better solutions.


By leveraging the power of large models, building local knowledge bases, and using efficient multimodal knowledge retrieval and generation capabilities, we can provide enterprises with reliable, secure, and efficient intelligent solutions.