The premise of AI application is to solve the hallucination problem of DeepSeek

The key to successful AI application: Solve the DeepSeek illusion problem first.
Core content:
1. Challenges of AI application in companies with different levels of digitalization
2. Model illusion phenomenon and its impact in DeepSeek application
3. Discussion on the causes and solutions of model illusion
Follow the public account and reply 1
Get the "Management Secrets" for front-line, director and senior executives
Last year, when I was doing deep AI customization for some companies, the biggest problem I encountered was: AI was just scratching the surface!
Companies with a low level of digitalization will not have a chance to use AI at all; companies with a high level of digitalization will face completely different problems:
The technical team can have better choices based on the functions provided by AI, and AI is not a must; if the functions that the business side wants are not provided, it can be considered that AI is just a packaging of an already good function.
If we look into it in depth, the business side’s judgment criteria are: As long as the AI application cannot completely take over the business, it is just a toy, and further, it is garbage to the business side!
This was the case with applications based on GPT-4o before, and it is even more so with applications based on DeepSeek now. The problem may be even more serious!
Model Illusion
The most feared thing about AI applications is model hallucination. It is despairing to receive different answers to the same question. Imagine:
If you were a patient and the doctor gave you two answers to the same question, would you be panicked? If you were a lawyer and your lawyer gave you completely different results for the same question, would you be afraid?
According to Vectara HHEM AI hallucination test, DeepSeek-R1 showed a 14.3% hallucination rate:
Other models will do much better, but they still have to deal with the original question: Can you completely take over the business when there are hallucinations? If not, then AI applications will suffer because people will be more harsh on AI .
Causes of hallucinations
The working principle of the large model is similar to the idiom chain game. Each idiom is a "high-dimensional vector" . Its meaning is like the vector of a word, and the connection between idioms is the reasoning process of the model.
During the training process, the model is first pre-trained with unlabeled data to learn how to speak. The model can intuitively derive the next word, for example: colorful-pervert;
Then, we fine-tune the model through labeled data , so that it knows the correct collocation of each idiom and can add new words after each word. This is similar to task learning, and the model really learns to connect the words, for example: colorful - colorful;
But there are two situations that can cause model hallucinations:
First, if there is never any colorfulness in the fine-tuning data, the model will be randomly connected and completely unpredictable; Secondly, if there is wrong data in the system, for example, if there is a pervert in the fine-tuning data, then the model will be wrong;
To sum up, this is the most superficial explanation of the big model illusion . The problems that arise in real scenarios will be more numerous and complex, and all of these problems will directly lead to illusions. For example, is it more important to manage people or to manage things in a piece of data? Some people will say that it is more important to manage people and some will say that it is more important to manage things, so the model will be affected and led astray.
Why is the R1 illusion so powerful?
The data show that the degree of hallucination in R1 is nearly 4 times that of V3, which may be related to the characteristics of the inference model.
As we said before, the prompt words in the GPT era may not be applicable to DeepSeek. The reason is the difference between the instruction model and the command model :
Reasoning models focus more on understanding and reasoning. You only need to provide the goal or problem, and the model will analyze and find a solution by itself. The directive model relies on clear instructions and steps. You need to provide detailed guidance, and the model will perform the task strictly according to your instructions.
So the DeepSeek hint technique becomes:
Formula: I want [goal], for [audience], hope to achieve [core appeal], but worry about [potential challenges].
However, as we have said before, for engineering control, we don’t want the model to play freely most of the time . For example, engineering actually expects a kind of keyword recognition:
Student Expression | Types of Anxiety | Specific categories |
---|---|---|
I'm about to explode | Too much study pressure | Test Anxiety |
I'm mentally broken | Too much study pressure | High cognitive load |
I reviewed for a long time, but I still can't | Too much study pressure | Perfectionism Anxiety |
In this scenario, the inference model may not perform as well as the instruction model (but specific testing is needed) .
Simply put, the freer the model, the more hallucinations it may have . When the model reasons through long chains of thought, it may consider problems from different perspectives that are not always consistent with reality, which leads to hallucinations.
What happened to V3 to R1?
The R1 model mainly relies on reinforcement learning of V3 to achieve the model effect, especially the use of the GRPO algorithm .
However, this mechanism can lead to the hallucination problem: if the reward function overemphasizes creativity or fluency and ignores factual accuracy, the model will tend to generate content that seems reasonable but is not true.
When dealing with complex tasks, the GRPO algorithm may reason through long chains of thought, but this process can easily introduce unrealistic assumptions or logical jumps, exacerbating hallucinations.
Then, R1’s self-correction mechanism is insufficient, making it difficult to effectively identify and correct hallucinations. Some hallucinations appear to be reasonable, but are difficult for the model to detect, and correcting errors may require a lot of interaction and feedback, which is costly.
How to eliminate hallucinations
Although the R1 hallucination is quite severe, DeepSeek is still the best base model choice in China, so how to eliminate the model hallucination is an issue that we should focus on when doing engineering applications.
Because GPT also has hallucinations, this question has been answered before: knowledge graph + engineering control , the most common one is RAG :
A technical architecture that combines information retrieval and language generation. In RAG, the model first retrieves information related to the user's query by searching external knowledge sources (such as document libraries, databases, etc.), and then uses this information to generate more accurate answers.
Specifically, the working principle of RAG is divided into the following steps:
Query processing and understanding: First, the user's input query is converted into a form suitable for retrieval (usually a vector representation). Information Retrieval: The model then uses these vector representations to search in a pre-built knowledge base or index to find the pieces of information that are most relevant to the query. These pieces of information are typically text paragraphs or data records. Generate answers: The retrieved information is passed to a language generation model to generate more accurate, fact-based answers or text output.
By combining external search results, RAG can effectively improve the quality of generated content and avoid generating irrelevant or erroneous information, thereby enhancing the accuracy and reliability of generated answers.
It’s hard to understand after saying so much, so let’s take an example:
AI lawyer hallucination problem
For example, a user's question is: What are the legal obligations between landlords and tenants in the United States?
Model-generated answer: In the United States, obligations between landlords and tenants include rent payment, property maintenance, termination, and securing the property. However, specific legal obligations vary by state, and tenants and landlords must comply with local regulations.
This answer seems reasonable, but the model does not provide specific legal basis or accurate citations, and the generated content is likely to be wrong or based only on general knowledge rather than specific legal provisions.
Therefore, this response may be an “illusion”: it appears to be true but lacks evidence to support it and may mislead users.
Now, let’s apply the RAG technique to the same problem. Assume that the AI lawyer model can access an external legal database and retrieve legal provisions or cases related to the obligations of landlords and tenants.
RAG enhanced answer:
Under Section 504 of the Uniform Residential Tenancy Act, landlords are responsible for ensuring that the leased property is maintained in a suitable living condition and for repairing and maintaining the property, while tenants are obligated to pay rent on time and keep the leased property clean.
Additionally, in California, under California Civil Code Section 1941.1, landlords must ensure a safe living environment, including addressing problems with water, electricity, and heating systems. If a landlord fails to meet these obligations, tenants can seek damages under the law.
Then we will discuss how RAG eliminates hallucinations.
How RAG eliminates hallucinations
The question asked by the user is: "What are the legal obligations between landlords and tenants in the United States?"
Here, engineering control of AI applications comes into play:
1. Input Processing
A user asks a question such as: "What are the legal obligations between landlords and tenants in the United States?"
At this stage, the model needs to understand the user's query and convert it into a form that can be retrieved .
Typically, the model uses a pre-trained natural language processing model to convert the input query into a high-dimensional vector.
The generated vector will be used for subsequent information retrieval. The model compares the query vector with the documents already in the database to find the content most relevant to the query.
But if you don't want to rely on the vector library, you can also use keyword filtering directly, but it depends on how the project is designed.
Because the vector library is actually a small model. Although the small model is less likely to have model hallucinations due to the small amount of data, its ability to accurately match is definitely not as good as database keyword retrieval. This is the core issue of vector library technology.
In short, the stronger the engineering capabilities, the more dependent on keyword searches. As for what to use, it depends on the business model.
2. Information Retrieval
The goal of information retrieval is to find text snippets related to the user's query from an external knowledge base, using some tools and functions (which are basically well packaged).
Some companies now rely on vector libraries, while others rely on a set of keyword generalization knowledge base systems . Their purposes are similar:
Suppose the query is: "What are the legal obligations between landlords and tenants in the United States?"
Search results may include:
Section 504 of the Uniform Residential Tenancy Act states that a landlord has a duty to maintain a property in a habitable condition. California Civil Code Section 1941.1: Landlords must provide a safe living environment and address problems with water, heating, and other systems.
3. Prompt word processing
In fact, it is not difficult to just search for the content based on the keywords and then process it with traditional prompt words.
Finally, the integrated content will be used as the input of the generative model. For example, the answer combining multiple legal provisions and cases will be:
Under Section 504 of the Uniform Residential Tenancy Act, landlords are responsible for ensuring that rental properties are maintained in suitable living conditions and for making necessary repairs and maintenance. Additionally, in California, under California Civil Code Section 1941.1, landlords must ensure a safe living environment and address system problems such as water and heating. If a landlord fails to meet these obligations, tenants can seek damages under the law.
Finally, call the big model. Here is the complete prompt word process for your reference:
Complete Process
First, the traditional process:
Question: "What are the legal obligations between landlords and tenants in the United States?" Tips:
You are an experienced lawyer. Now a user asks you a legal question. Please answer it.
The current user's question is:
What are the legal obligations between landlords and tenants in the United States?
Outcome: "In the United States, legal obligations between landlords and tenants include rent payment, property maintenance, termination, and securing the property. However, specific legal obligations vary by state, and tenants and landlords must comply with local regulations."
Possible issues:
There is a lack of specific legal basis. Failure to mention specific articles or state-specific legal provisions may mislead users, especially when clarity of legal basis is required.
The second is the RAG process, which mainly depends on the changes in the prompt words:
You are an experienced lawyer. Now a user asks you a legal question. Please answer it.
The current user's question is:
What are the legal obligations between landlords and tenants in the United States?
The following are the relevant legal provisions and cases retrieved to help you answer user questions:
Under Section 504 of the Uniform Residential Tenancy Act, landlords have a duty to ensure that the rental property is maintained in a habitable condition and to perform necessary repairs and maintenance. Under California Civil Code Section 1941.1, landlords must ensure that rental properties provide a safe living environment, including addressing problems with systems such as water and heating. Section 227 of the New York State Tenancy Law imposes an obligation on landlords to ensure that properties comply with local building standards and provide safe living conditions for tenants. U.S. courts have ruled in multiple cases that when a landlord fails to fulfill the above obligations, the tenant has the right to claim compensation, including but not limited to rent reduction or relocation expenses. Please answer the user's questions based on the above legal provisions and cases.
So the answer is:
Under Section 504 of the Uniform Residential Tenancy Act, landlords are responsible for ensuring that leased properties are maintained in suitable living conditions and for repairs and maintenance. Additionally, in California, under California Civil Code Section 1941.1, landlords must ensure that leased properties provide a safe living environment, including addressing system problems such as water and heating. If a landlord fails to fulfill these obligations, tenants can seek compensation under the law. In addition, under Section 227 of the New York State Lease Law, landlords must also ensure that properties meet local building standards and provide tenants with safe living conditions. In multiple court cases, when landlords fail to fulfill the above responsibilities, tenants have the right to claim compensation, including rent reduction or relocation expenses.
In summary, the model hallucination problem is solved...
Model training techniques
RAG is a low-cost method in the early stage . If you want to rely on model training itself to eliminate hallucinations, it is not impossible...
For example, by setting different strategies for different types of tasks during training , the model can learn when to provide creative content and when to remain faithful and accurate.
For example, when dealing with tasks such as summarization and translation, the model must pay more attention to being faithful to the content of the original text rather than freely exercising creativity.
Training for such tasks usually requires additional supervised data and constraints to ensure the accuracy of the output.
But if you really want to do this, you may spend more than 10% of resources for 1% of the effect, which may not be cost-effective...
Because they require additional data annotation, long-term fine-tuning and reinforcement training, and often require more computing resources and human intervention.
In addition, the meticulous adjustment of the model during the fine-tuning process will also increase the complexity of model development and training. Especially when dealing with the diversity and details of tasks, how to find the right balance so that the model can provide creative answers while maintaining high fidelity and accuracy is a complex and resource-intensive task.
In general, although refined training can reduce the occurrence of hallucinations, its cost cannot be ignored, especially in scenarios that require large-scale data and computing resources. This also provides challenges and directions for the development of large models in the future.
Conclusion
In the practice of AI engineering applications, model hallucination is always a challenge that is difficult to avoid.
Starting from the business scenario, we explored the dilemma of AI application in enterprises. Especially when AI cannot meet the business requirements for high accuracy and reliability, the emergence of hallucination phenomenon often makes AI application be regarded as "ineffective tool".
This reveals a key issue: AI does not necessarily need to completely take over the business, but in certain high-demand fields (such as law, medical care, etc.), the accuracy of the model must be guaranteed, otherwise hallucinations will seriously affect the application effect.
By analyzing the causes of model hallucinations, we found that the working principles of large models, the quality of training data, and the high degree of freedom in the reasoning process are all causes of hallucinations.
RAG technology: By combining external knowledge bases, it can provide more factual support for generating answers, thereby effectively reducing the occurrence of hallucinations.
However, RAG technology is not perfect. It relies on the quality and coverage of the external knowledge base and may introduce additional system complexity and response time.
Finally, although refined training (such as fine-tuning, reinforcement learning, etc.) can further reduce hallucinations, these methods are costly. Especially in large-scale applications, how to find a balance between technical optimization and cost-effectiveness remains an important challenge in engineering practice.