Does Transformer secretly take shortcuts when reasoning? Revealing the implicit reasoning limitations of large models

Written by
Clara Bennett
Updated on:July-11th-2025
Recommendation

In-depth exploration of the implicit reasoning mechanism of the Transformer model, revealing its limitations in multi-step reasoning tasks.

Core content:
1. Analysis of the reasoning ability of the Transformer model in natural language processing tasks
2. Comparison of efficiency and accuracy of implicit reasoning and explicit reasoning
3. The paper reveals the implicit reasoning limitations of the Transformer model that rely on data patterns

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Today we are going to analyze a new paper from a team at Fudan University titled "Implicit Reasoning in Transformers is Reasoning through Shortcuts". This paper delves into the internal mechanisms of the Transformer model when performing multi-step mathematical reasoning, revealing that what we usually think of as reasoning ability is likely just a shortcut taken by the model when learning data patterns.

1. The “implicit” and “explicit” nature of Transformer’s reasoning capabilities

In recent years, with the rise of the Transformer architecture, large language models (LLMs) have demonstrated amazing capabilities in various natural language processing tasks, especially in complex reasoning. The introduction of Chain-of-Thought (CoT) greatly improves the performance of LLMs on multi-step reasoning tasks by allowing the model to explicitly generate intermediate reasoning steps, giving rise to a number of powerful reasoning models.

However, unlike this explicit reasoning , implicit reasoning  focuses more on whether the model can complete the reasoning process in its internal hidden state in the process of generating the final answer without generating additional intermediate steps. The advantage of implicit reasoning is higher reasoning efficiency because it requires generating fewer tokens . For example, directly inputting a math problem that requires multiple steps of calculation, and the model directly gives the answer, this is implicit reasoning.

So the question is, if implicit reasoning is so efficient, why is it often inferior to explicit reasoning in tasks that require advanced reasoning capabilities? This is exactly the core question that this paper attempts to answer. The researchers conducted a detailed analysis experiment on the GPT-2 model trained from scratch on a specific mathematical reasoning dataset, revealing the underlying mechanism of implicit reasoning in the Transformer model.

2. The “Shortcut Dependence” Nature of Implicit Reasoning

The main contribution of this paper is that, through in-depth analysis of the model's learning behavior under different data modes, it reveals that the Transformer model tends to perform implicit reasoning by learning "shortcuts" in the data, rather than true, general step-by-step reasoning capabilities. This core finding can be summarized in the following key points:

  1. "Seemingly" step-by-step reasoning in a fixed pattern : When the model is trained on fixed-pattern data, for example, all premises are arranged in the order of the actual steps of the calculation, the researchers found that the Transformer model is indeed able to perform step-by-step internal reasoning and achieve high accuracy in both in-domain and out-of-domain (longer reasoning steps) tests. Technical means such as activation patching also reveal that there is an information flow inside the model that gradually transmits intermediate calculation results. This seems to indicate that the model has implicit step-by-step reasoning capabilities.


  2. "Shortcuts" and overfitting in unfixed patterns : However, when the training data presents an unfixed pattern, for example, the order of the premise is disrupted, the situation is quite different. In this case, the implicit reasoning ability learned by the model tends to overfit the specific data pattern and is difficult to generalize to new patterns . The paper focuses on a phenomenon called " Variable as Subtrahend Plight". For example:

    m = 16 - 5
    z = m - 11 (or z = 11 - m)
    b = z + 22
    b = ?

    In the training data, if the variablemAlways appearing as a subtrahend, the model may learn a shortcut of direct "chain calculation" to directly combine numbers and ignore the actual meaning and dependencies of the variables. However, when the variablemWhen it appears as a minuend, this shortcut fails, causing the model performance to drop sharply.



  3. "Common problems" of large language models : Even more surprisingly, the researchers found that even the most advanced LLMs, such as GPT-4o, Claude 3.5 Sonnet, Llama 3, and Qwen2.5, also exhibited similar generalization problems when faced with "Variable as Subtrahend Plight". The accuracy of these models also drops significantly when the number of variables used as subtrahends in the problem increases . This shows that even LLMs trained with massive amounts of diverse data may rely on certain patterns and "shortcuts" in the data when performing multi-step implicit reasoning, rather than performing real and robust reasoning.


In summary, the core idea of ​​this paper is that the ability of Transformer models to reason implicitly that we observe is largely due to the fact that they learn to "take shortcuts" by exploiting patterns and regularities in the training data . When the pattern of the task changes, or requires deeper understanding and step-by-step reasoning, these "shortcuts" will fail, causing the model performance to degrade.

3. How to reveal the “shortcuts” within the model?

To test their hypothesis, the researchers designed a series of sophisticated experiments and used a variety of model analysis techniques:

  1. Synthetic Mathematical Reasoning Dataset : The researchers constructed a synthetic dataset containing multi-step continuous modular addition and subtraction. The purpose of using modular operations is to avoid large number calculations and numbers being split into multiple tokens, so as to focus more on studying the reasoning itself. The dataset contains calculation templates of different lengths (2 to 5 steps) and is instantiated with random variable names. In order to prevent the model from memorizing intermediate results in the training data, the generation of the test set has been strictly screened.



  2. Model selection and training : The study selected a standard 12-layer GPT-2 model and replaced the original positional encoding with a rotational positional encoding (RoPE) to enhance the model's generalization ability to sequence length. The model was trained from scratch on a synthetic dataset.


  3. Key Analysis Tool: Activation Patching : Activation Patching is a technique used to identify important modules in a model. The basic idea is to replace the original activation with the activation generated by another related sample in the module when the model processes an input sample, and then observe the changes in the model output. If replacing the activation causes a significant change in the output, it indicates that the module is crucial to producing the correct output. Through the activation patching technique, the researchers tracked the flow of information in different layers and different positions of the model , and analyzed which activations are related to intermediate calculation results . They also used sliding window patches  to capture possible calculation patterns in the model that involve multiple adjacent layers and tokens.



  4. Information flow tracking experiment : By changing an operand or operator in the input and using the activation patch technique to observe its impact on the final output, the researchers were able to track the path of information transmission within the model. The experimental results show that the transmission of key information shows a trend of gradually propagating along the diagonal line, which provides evidence for the model's step-by-step reasoning.



  5. Intermediate result tracking experiment : By controlling whether the intermediate results remain unchanged while changing the input operands, the researchers further analyzed which areas within the model store information related to the intermediate calculation results. The experiment shows that during the step-by-step reasoning process, the information of the intermediate results will be stored and used in subsequent steps.


  6. Attention window size limitation experiment : In order to verify whether the model relies on the calculation results of the previous step for the next reasoning, the researchers modified the attention mechanism of the Transformer model to limit each token to only pay attention to the token of the fixed window size in front of it. The experimental results show that when the attention window size is not enough to cover the calculation results of the previous step, the reasoning ability of the model will be significantly reduced, which further supports the assumption that the model performs step-by-step reasoning.



  7. Mechanism analysis of "Variable as Subtrahend Plight" : In order to gain a deeper understanding of why the model has difficulty handling "Variable as Subtrahend Plight", the researchers performed activation patch analysis on different combinations of operators and variable positions. They found that when a variable is used as a subtrahend, the model needs to pay attention to the value of the variable in order to perform subsequent calculations, which is in stark contrast to the "shortcut" behavior of the model directly "chaining calculations" when there is no variable as a subtrahend.



  8. Evaluation of large language models : To verify whether the findings on small models are applicable to large language models, the researchers conducted zero-shot evaluations on multiple SoTA LLMs to test their performance on problems with different "Variable as Subtrahend" ratios. The experimental results show that this problem is also common in LLMs, which confirms the findings on small models.



IV. Application and Inspiration

The findings of this study have important implications for our understanding of the Transformer model, especially the reasoning ability of large language models:

  • Re-examining the "reasoning" ability : Studies have shown that the high accuracy of the model we observe on certain tasks may not be due to true general reasoning ability, but rather the model has learned specific patterns and "shortcuts" in the data. This reminds us that when evaluating the model's reasoning ability, we need to pay more attention to its generalization performance on different patterns and more challenging tasks.


  • Understanding the limitations of implicit reasoning : The paper clearly reveals the inherent limitations of implicit reasoning when dealing with tasks that require complex variable tracking and non-fixed patterns. This helps us better understand why explicit reasoning (such as CoT) can achieve better results in certain complex reasoning scenarios.


  • Guiding future research directions : This work provides new ideas for improving the reasoning ability of LLMs in the future. Future research can focus on exploring the following directions:

    • How can we design more effective training strategies that force models to learn true step-by-step reasoning rather than relying on “shortcuts”?  For example, through more sophisticated data augmentation, curriculum learning, or the introduction of stronger supervisory signals?

    • Can a model architecture be designed that is more suitable for complex reasoning?  The current Transformer architecture may have inherent deficiencies in dealing with tasks that are not fixed in pattern and require long-term dependencies.

    • How can we incorporate the advantages of explicit reasoning into implicit reasoning to improve efficiency and robustness?  Is it possible to develop a hybrid reasoning model that dynamically selects the reasoning method based on the complexity of the task?


  • Focus on challenges such as "Variable as Subtrahend Plight" : Phenomena such as "Variable as Subtrahend Plight" discovered in the study provide us with a specific entry point to deeply study the internal mechanisms and potential defects of the model when processing different types of logical and arithmetic operations.


In summary, this paper reveals the "shortcut" nature of implicit reasoning in the Transformer model through rigorous experiments and in-depth analysis. Although current LLMs can efficiently complete reasoning tasks in some cases, their capabilities are largely limited by the patterns of training data . To achieve truly general and powerful reasoning capabilities, we also need to have a deeper understanding of the internal working mechanism of the model and explore new training methods and model architectures. I hope that this research can inspire more exploration of the reasoning mechanism of LLMs and push AI technology to a higher level.