Big model chain thinking: How to think about Deepseek's big model

Written by
Iris Vance
Updated on:July-16th-2025
Recommendation

Explore the deep reasoning capabilities of Deepseek's large model and gain insight into how its chain thinking helps solve complex problems.

Core content:
1. Analysis of the chain thinking of Deepseek's large model
2. How chain thinking improves the accuracy and interpretability of large models
3. The advantages and limitations of Deepseek in specific fields

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Abstract: Deepseek has been very popular recently, and its deep reasoning ability is worth everyone's attention. Its deep reasoning ability is related to its thinking chain. The thinking chain of the big model is the process of breaking down complex problems into a series of orderly steps and getting the answer through step-by-step reasoning. It is like thinking step by step when solving a problem, breaking down the problem into smaller parts so that the big model can process information more systematically.


Chain thinking improves the capabilities of large models in many ways. First, in terms of accuracy, it helps large models to break down problems and deduce them step by step when dealing with complex problems, such as mathematical reasoning and logical judgment, to reduce the error rate. Second, in terms of interpretability, chain thinking makes the reasoning process of the model transparent, making it easier for people to understand why the model reaches a specific conclusion and enhancing trust in the model's decision-making.


The DeepSeek large model has its unique application scenarios and limitations. In terms of applicability, it excels at logical reasoning tasks, such as mathematical derivation and code generation. With its powerful deep reasoning capabilities, it can efficiently solve complex problems. It also performs well in some scenarios of natural language processing that require deep logical analysis, such as complex text understanding and professional field question and answer. However, DeepSeek is not the best choice for divergent tasks such as creative writing. Compared with other models, DeepSeek focuses more on deep reasoning. Some general models may have advantages in the diversity and creativity of text generation, while DeepSeek stands out in tasks with high logical density.


In general, the DeepSeek model has shown strong strength in specific areas due to its deep reasoning ability brought by the thought chain, but it also has certain limitations. Understanding these characteristics will help us better use it in different scenarios.



  • What is the thinking chain of the big model?

  • How chain thinking can improve the ability of large models
  • What is the DeepSeek large model suitable for and what is it not suitable for and what is the difference with other models?



01

What is the thinking chain of the big model?


Before discussing the chain thinking of the big model, let's take a look at a simple example in life. If you go to a restaurant for dinner and see the following on the menu: "Apple pie, each requires 3 apples to make, today's inventory of apples can make 5 apple pies, and the kitchen has just purchased 10 more apples. How many apples are there in the restaurant now?" Faced with this question, our thinking process may be like this: first calculate the number of apples originally needed to make 5 apple pies, that is, 3×5 = 15; then add the newly purchased 10 apples, 15+10=25, and there are 25 apples in the restaurant now.

This way of thinking, which breaks down a complex problem into multiple simple steps, analyzes them step by step, and draws conclusions, is the core of the chain thinking of the big model. Simply put, the Chain of Thought (CoT) of the big model is a method that allows the big language model to break down complex problems into multiple sub-problems and solve these sub-problems in sequence in a certain logical order, thus forming a complete reasoning chain. It breaks the simple mapping mode of the traditional model from input directly to output, introduces intermediate reasoning steps, and makes the decision-making process of the model more transparent and explainable.

For example, when we ask the big model a complex question about the cause and effect of historical events, chain thinking will guide the model to first sort out the background of the event, then analyze the direct and indirect causes, and the interaction between these factors, and finally draw conclusions, rather than directly giving a general answer.

The birth background of chain thinking

The birth of large-model chain thinking has its specific background and technological development context. In 2017, the Transformer model was born, which completely changed the landscape of natural language processing and even the entire field of machine learning. It abandoned the inherent patterns of traditional recurrent neural networks (RNN) and convolutional neural networks (CNN) and introduced a self-attention mechanism, which enables the model to process sequence data in parallel, greatly improving computational efficiency while also better capturing long-distance dependencies. Based on the Transformer architecture, researchers began to explore the potential of pre-trained models by pre-training on large-scale unlabeled data and then fine-tuning on specific tasks. This approach has achieved remarkable results in multiple natural language processing tasks, such as the success of the BERT and GPT series models.

However, as the scale of pre-trained models continues to increase, fine-tuning faces more and more challenges. On the one hand, the cost of fine-tuning has risen sharply, requiring a lot of computing resources and time, which is a huge burden for many research institutions and enterprises; on the other hand, even if a lot of resources are invested in fine-tuning, the performance of the model on some complex tasks is still unsatisfactory, the generalization ability is insufficient, and it is difficult to cope with diverse problems in the real world. In order to solve these problems, researchers began to turn to prompt engineering. Prompt engineering guides the model to generate more expected outputs by carefully designing input prompts. It can improve the performance of the model on specific tasks to a certain extent without large-scale parameter adjustments to the model.

Although hint engineering has alleviated the dilemma of fine-tuning to a certain extent, traditional hint methods still seem to be powerless when faced with complex reasoning problems. For example, in tasks such as arithmetic reasoning, common sense reasoning, and symbolic reasoning, the model often gives wrong answers because it cannot understand the complex logical structure of the problem. In order to break through this bottleneck, chain thinking came into being. It draws on the way humans think about solving complex problems, decomposes the problem into multiple simple sub-problems, and obtains the final answer through step-by-step reasoning, thus providing a new way of thinking for large models to handle complex tasks.




02

How chain thinking can improve the ability of large models


1. Improving Reasoning Accuracy

Chain thinking plays a key role in improving the accuracy of large model reasoning. Taking mathematical reasoning tasks as an example, when faced with complex mathematical word problems, traditional large models may directly give wrong answers because they cannot effectively sort out the quantitative relationships and logical steps in the problem. However, after introducing chain thinking, the model can break down the problem into multiple sub-steps and perform reasoning calculations step by step.

Chain thinking is also effective in common sense reasoning tasks. For example, when asked a question like "If it rains outside today, what should I bring when I go out?" Chain thinking will guide the model to first think about the environmental characteristics of rainy days, that is, rain will get the body wet; then consider ways to deal with getting wet, such as using rain gear; and finally come to the conclusion that you should bring umbrellas, raincoats and other rain gear when you go out. Through such chain reasoning, the model can understand the problem more comprehensively and accurately, and give answers that conform to common sense, avoiding illogical situations such as directly answering to wear sunglasses.

2. Enhance the explainability of the reasoning process

Chain thinking not only improves the accuracy of large model reasoning, but also greatly enhances the interpretability of the reasoning process. When the model generates a series of reasoning steps, we can clearly see how it starts from the problem and gradually derives the conclusion. This is like opening the "black box" of the model's decision-making process, allowing us to see its thinking process at a glance.

Taking answering questions related to historical events as an example, when we ask "Why did the Industrial Revolution first occur in Britain?", the big model with chain thinking may answer like this: First, from the perspective of economic foundation, Britain already had large overseas colonies in the 18th century, which provided it with a vast supply of raw materials and commodity sales markets, accumulated a large amount of capital, and provided economic support for the Industrial Revolution; secondly, from the perspective of technical conditions, Britain's long-term development of handicraft workshops has cultivated a large number of skilled technical workers. They have accumulated rich production experience in practice, laying a technical foundation for the invention and improvement of machines; thirdly, from the perspective of political system, Britain completed the bourgeois revolution relatively early and established a constitutional monarchy. This stable political environment provided a guarantee for the development of the Industrial Revolution and was conducive to the implementation of policies and the allocation of resources; finally, from the perspective of cultural concepts, British society advocates science and innovation, and encourages people to pursue new technologies and inventions. This cultural atmosphere also promoted the rise of the Industrial Revolution.

Through such detailed reasoning steps, we can clearly understand the factors and logic that the model is based on to answer questions. This not only helps us judge the rationality of the answer, but also allows us to learn the methods and ideas of analyzing problems from the model's reasoning process. If the answer given by the model is biased, we can also accurately find out the problem based on its reasoning steps, whether it is a misunderstanding of a certain premise or a loophole in the reasoning logic, so as to make targeted improvements and optimizations.

Types of chain thinking

1. Few-Shot CoT

Few-sample chain thinking is to provide a small number of examples containing chain thinking processes in the prompts, so as to guide the large language model to generate similar reasoning steps when dealing with new problems, thereby enhancing the model's reasoning ability. In this process, the examples are like a "template" for solving problems, showing the model how to gradually break down and solve a complex problem.

In the field of image recognition, few-sample chain thinking has interesting applications. For example, when we need to train a model to determine the category of the image content, if we let the model recognize it directly, it may be difficult for the model to accurately understand the differences between categories due to limited sample data. However, if we use few-sample chain thinking, we can provide some examples, such as "Example 1: There is an animal in the picture. It has four legs and black and white stripes on its body. It is a mammal. The answer is zebra; Example 2: There is an object in the picture. It has four wheels and can drive on the road. It can carry people inside. The answer is car". Through these examples, the model can learn the thinking process of recognition, that is, first observe the features of the object in the picture, and then judge the category based on these features. When encountering a new picture, the model will follow this way of thinking, first analyze the color, shape, function and other features of the object in the picture, and then gradually infer the category it belongs to, so as to improve the accuracy of recognition.

2. Zero-Shot CoT

Zero-shot chain thinking is a more concise and efficient way. It does not require specific examples. Just by adding some specific prompts after the question, such as "Let's think step by step" and "Please analyze the reasoning process in detail", the model can be guided to break down the task and reason step by step. This method makes full use of the knowledge and language understanding ability that the large language model has learned, and stimulates its internal reasoning mechanism.

In the field of intelligent customer service, zero-sample chain thinking plays an important role. When a customer asks a complex question, such as "I want to book a round-trip ticket from Shanghai to Beijing next month, I want to depart in the morning and return in the evening, economy class, the price is less than 2,000 yuan, and I hope to accumulate airline miles. What flights can I choose?" Without zero-sample chain thinking, the intelligent customer service model may find it difficult to accurately understand all the needs of the customer, resulting in inaccurate or incomplete answers. However, after the introduction of zero-sample chain thinking, through the prompt "Let's think about this problem step by step", the model will break down the problem into multiple sub-problems: first determine that the departure and destination are Shanghai and Beijing; then clarify that the time range is next month, the departure time is morning, and the return time is evening; then filter out economy class flights with a price of less than 2,000 yuan; finally, find options from these flights that can accumulate airline miles. Through such step-by-step reasoning, the model can understand customer needs more comprehensively and accurately, and provide flight information that meets the requirements, greatly improving the quality and efficiency of customer service.



03

What is Deepseek suitable for and what is it ?

Based on the above understanding of thought chain, let's take a look at the Deepseek R1 model, which is a reasoning model that automatically performs relevant thought chain reasoning.

Reasoning about large models

The reasoning big model is a model that strengthens the reasoning, logical analysis and decision-making capabilities based on the traditional big language model. It usually uses additional technologies such as reinforcement learning, neural symbolic reasoning, meta-learning, etc. to enhance reasoning and problem-solving capabilities. For example, DeepSeek-R1 and GPT-o3 have outstanding performance in logical reasoning, mathematical reasoning and real-time problem solving.

Non-inferential large models

Applicable to most tasks, focusing on language generation, context understanding and natural language processing, with weak deep reasoning ability. Through training with large amounts of text data, it can master language rules and generate appropriate content. Examples include GPT-3, GPT-4 (OpenAI), and BERT (Google), which are mainly used for tasks such as language generation, language understanding, text classification, and translation.

Comparison between inference model and general model

Advantages: The reasoning model excels in mathematical deduction, logical analysis, code generation, and complex problem solving; the general model excels in text generation, creative writing, multi-round conversations, and open-ended question-answering.
Weak areas: Reasoning models are weak at divergent tasks (such as poetry writing); general models perform poorly on tasks that require strict logical chains (such as mathematical proofs).
Performance essence: Reasoning models specialize in tasks with high logic density; general models excel at tasks with high diversity.
Strength judgment: The reasoning model is not stronger in all aspects. It is significantly better than the general model only in its training target field. The general model is more flexible in general scenarios, but special tasks need to rely on prompt compensation capabilities.
As for deepseek R1, which is a large model of inference nature, its prompt words are much simpler than those of general models.
Here we can understand why people say that deepseek is better than other models, because people usually ask questions when they don't know the answer. If they already know the thought chain of the question, they probably know how to solve it. It's like looking at a math problem. If you already know the thought chain, you probably know how to solve it. If you don't know the answer, and ask the general model to answer, it won't be able to answer, but deepseek can answer it, which means it can think about the house demolition problem by itself.
In addition, we need to explain what the big model is not good at. The general big model is not good at practical experience in problem solving. Of course, if there is a knowledge base to make up for it, it can also be good at strengthening self-knowledge and continuous learning, large-scale structured data analysis and massive data processing (this is what big data is best at). Complex task processing, although Deepseek is now good at some reasoning. For example, mathematical derivation, logical analysis, code generation, and complex problem decomposition. However, an engineering content still needs to be decomposed and executed.

As a key technology in the development of large language models, large model chain thinking provides a new idea and method for models to handle complex tasks. It decomposes complex problems into multiple sub-problems and reasoning step by step in a logical order, which not only significantly improves the reasoning accuracy of the model, but also enhances the interpretability of the reasoning process, allowing us to better understand the decision-making process of the model.