In the name of a small model, a basin of rational cold water is poured: Reflection on the limitations caused by the actual test of DeepSeek-R1 7B

Written by

Caleb Hayes

Updated on:July-14th-2025

Recently, the author built a knowledge base system with DeepSeek-R1 7B model as the core, and found that its performance was quite different in actual testing. This article takes this practice as an example, combined with industry research data, analyzes the limitations of small model assistance, explores optimization directions, and rationally locates its applicable scenarios.

1. Three core issues exposed by actual measurement

The “two-sidedness” of knowledge retrieval and reasoning

In the knowledge base question answering test, although the model can respond quickly, the following problems often occur:

Retrieval bias : There is a phenomenon of taking things out of context when extracting complex logic from documents. For example, the contextual association of technical terms is severed, causing the answers to deviate from the original meaning (such as misinterpreting "knowledge distillation" as the process of distilling liquor).

Frequent hallucinations : lack of fact-checking ability, and making up knowledge points that are not clearly marked (such as fabricating the conclusion of an academic paper).

Shallow reasoning : When faced with a 30MB technical document, only keywords can be extracted and pieced together into simple conclusions, and it is impossible to deeply analyze the pros and cons of the technical paths.

Performance bottlenecks are prominent

Low processing efficiency : It took more than 40 minutes to import a 30MB document, and the peak usage of video memory was very high, far exceeding the official "lightweight" expectations.

Limited deployment on the end side : Although the Loongson CPU has been deployed locally, some netizens still report that memory overflows are frequently triggered in actual office scenarios (such as multi-tasking in parallel).

Insufficient adaptability in professional fields

Data shows that the model's parsing accuracy for documents in vertical fields such as finance and law is less than 60%, far lower than the 75% of similar products such as Microsoft Phi-3. For example, when analyzing the paper "Constructing a Cybersecurity Knowledge Base Model", the model confused the semantic boundaries between "ontology" and "entity", resulting in a break in the logical chain.

2. Optimization Path to Breakthrough Limitations

Fine-tuning at the data level

High-quality data screening : Referring to the experience of Microsoft Phi-3, adopt "textbook-level" structured data training to reduce Internet noise interference.

Dynamic knowledge injection : Combined with the RAG architecture (retrieval-augmented generation), the knowledge base is vectorized and used as an external memory module to alleviate the model illusion problem.

Improved engineering efficiency

Mixed precision quantization : Using the expert selection mechanism of DeepSeek-Coder-V2-Lite, non-core parameters are quantized to 8 bits, reducing video memory usage by 30%.

Distributed reasoning optimization : Using the heterogeneous computing capabilities of Cambrian chips, document preprocessing and model reasoning tasks are separated to shorten end-to-end processing time.

Customized transformation of vertical scenes

Domain knowledge distillation : As Andrew Ng’s team suggested, the output of large models such as GPT-4 can be used as supervision signals through transfer learning to improve the professionalism of small models.

Modular design : Referring to the ontology layering method of the network security knowledge base, a two-level architecture of "atomic ontology-application ontology" is constructed to enhance the accuracy of semantic parsing.

3. Rational positioning of small models: what to do and what not to do

Despite their limitations, small models still have irreplaceable value:

First choice for lightweight scenes

In resource-constrained scenarios such as smartphones and IoT devices, small models (such as Apple OpenELM) can perform tasks such as real-time voice translation and smart home control with their advantages of low latency (<500ms) and low power consumption (<1W).

Enterprise-level private deployment

The financial, medical and other fields have strict requirements on data privacy. Small models can avoid cloud transmission risks through local deployment, and the annual operation and maintenance cost of a single machine can be controlled within 50,000 yuan.

A supplement to the large model ecosystem

Stanford HAI research shows that using small models as preprocessing modules for large models (such as document summarization and intent classification) can reduce the overall reasoning cost by 47%.

Conclusion: Say goodbye to "omnipotence" and return to instrumental rationality

The actual test of DeepSeek-R1 7B reminds us that small models are by no means "reduced versions of large models ". Their value lies in the cost-effectiveness of specific scenarios. The industry needs to abandon the "parameter superstition" and instead build a hybrid ecosystem where "large models dominate complex cognition and small models focus on vertical tasks". As Meta engineers said: "The future AI battlefield will not be a showdown of model size, but a contest of system-level efficiency."