Digital Reveal! The AI Brain "The Less You Think, The Smarter You Are": A Revolution in Efficiency That Subverts Cognition

Written by
Silas Grey
Updated on:June-10th-2025
Recommendation

The way AI thinks is undergoing a revolution. Short thinking chains are not only more accurate but also significantly reduce costs.

Core content:
1. The reasoning mechanism of the large language model (LLM) and the computing challenges it faces
2. The advantages and efficiency improvement of short thinking chains in complex reasoning tasks
3. Technical logic analysis of AI "overfitting" and "information wandering"

 
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Prologue: The “brain” of AI: a hidden “thinking trap”?

Who is the “thinker” in the AI ​​era? ——A perspective on the reasoning mechanism of large models

At present, large language models (LLMs) are changing the world at an unprecedented speed. They are like having a "brain" and can handle complex language tasks and even perform advanced reasoning. Supporting these powerful capabilities is the carefully constructed "chain of thought" (CoT) inside them. This is a technology that allows the model to generate a series of intermediate reasoning steps before reaching the final answer, aiming to imitate the human logical thinking process. In the past, we generally believed that the longer AI "thinks" and the more detailed the reasoning steps, the smarter it is and the more accurate the results it obtains. This has forced the industry to face a cruel reality while pursuing stronger AI: the computing power consumed by LLM reasoning is becoming an "invisible killer" that hinders its large-scale application and innovation. According to industry analysis, the cost of a typical AI query (generating a response of several hundred words) is between 0.03 cents and 3.6 cents, while the cost of GPT-4 generating a 500-word response is about 8.4 cents [1]. This is a considerable expense for high-frequency call scenarios.

Shocking! Does AI "think too much" and is wrong? ——An anti-intuitive mystery emerges

However, the emergence of a recent study is overturning our inherent understanding of how AI "thinks" with unprecedented insight. The preprint paper "Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning" [2] published on May 23, 2025, by top researchers from Meta and the Hebrew University, pointed out that large language models are not smarter the more they "think". On the contrary, in many complex reasoning tasks, shorter and more refined thinking chains can actually bring higher accuracy and significant efficiency improvements!

The data thrown out by this study is even more shocking: the shortest reasoning chain is 34.5% more accurate than the longest thinking chain. This means that our long-standing "long chain worship" may be a huge misunderstanding. What's more exciting is that this discovery is not only a theoretical breakthrough, it also brings a real leap in efficiency: by optimizing the reasoning process, the wall time (real time actually consumed) of LLM can be reduced by up to 33%, while the consumption of thinking tokens can be reduced by 40%. This "short thinking" efficiency revolution is opening a new door for the LLM industry at an unprecedented speed.

Chapter 1: Decoding AI’s “subtraction philosophy”: the secret of efficient intelligence

Why does “deep consideration” turn into “deep in the quagmire”? ——AI’s “overfitting” and “information wandering”

Why is this so? Does AI really make mistakes by "thinking too much"? This may sound counterintuitive, but in the world of AI, there is a profound technical logic behind it.

First, an overly long chain of thought may cause the model to fall into a dilemma similar to the human "overanalysis syndrome". When the LLM is forced to generate lengthy and complex intermediate reasoning steps, it may fall into "information wandering" or "path dependence" like a human walking through a maze. Each additional "thinking" may introduce new errors or noise. These tiny deviations continue to accumulate in a long chain of thought and may eventually form an "error avalanche" effect. As pointed out in the study "When More is Less: Understanding Chain-of-Thought Length in LLMs" [3], the sensitivity of longer reasoning processes to noise increases exponentially, and a single error may mislead the entire chain of thought.

This diagram shows that the longer the thinking chain is, the better it is. There is an optimal length. Beyond this length, performance may decline. This phenomenon is similar to the propagation of rounding errors in numerical calculations. The slight deviations in each step are infinitely magnified in the long chain. Studies have found that long reasoning processes are more sensitive to noise, and the accumulated uncertainty and potential errors grow exponentially, eventually exceeding the benefits of additional reasoning steps.

Secondly, long chains of thought may also lead to "overfitting" or "information overload" of the model's attention mechanism. When the sequence length exceeds the typical length during model training, the attention mechanism in the Transformer architecture may have difficulty effectively processing these out-of-distribution data, resulting in the dilution or distortion of key information. The model may capture false correlations in the training data (i.e., "shortcut learning") rather than truly understanding the logic of the problem. For example, the latest research from the Massachusetts Institute of Technology (MIT) found that large language models seem professional in New York navigation tasks, but when faced with simple detours, they will "collapse spectacularly", exposing that they are just performing complex pattern matching without a real understanding of urban geography or route principles. When the model pursues "exhaustiveness" rather than "precision", it may "go off track" because of "biting off more than it can chew" and miss the concise path to the truth.

Meta’s new “thinking model”: short-m@kHow to achieve “speed, accuracy, and ruthlessness”?

Based on a deep insight into the above phenomenon, the research team of Meta and Hebrew University proposed short-m@k, an innovative reasoning method called, which perfectly interprets the "subtractive philosophy" of AI.

short-m@kThe core idea is to generate multiple kindependent thinking chains in parallel, then intelligently select the first few mthinking chains to be completed, and get the final answer through majority voting. This method cleverly takes advantage of parallel computing and avoids the lengthy waiting time for all thinking chains to be completed. It is like an efficient decision-making team, allowing multiple "thinkers" to work in parallel at the same time. Once the fastest few reach a consensus, they will immediately make a decision instead of waiting for those "slow thinkers" or "people with entanglement syndrome" to give redundant or even wrong answers.

To help you understand the method more intuitively short-m@k, we can refer to Figure 1 in the original article: Visual comparison between majority voting and our proposed method short-m@k  (as shown below). The figure clearly shows that traditional majority voting (majority@k) needs to wait for all k thinking processes to complete, while short-m@kthe calculation is terminated after the earliest m thinking processes are completed, thus greatly saving time and computing resources.

Diagram: Majority voting vs. short-m@kmethod comparison

This "fast, accurate and ruthless" strategy short-m@khas brought about a real improvement in efficiency. Experimental data shows that short-1@k(i.e., only taking the first completed thinking chain) can even match or exceed the standard majority voting method under a low computing budget, and the thinking token consumption can be reduced by up to 40%. And short-3@k(i.e. , taking the first three completed thinking chains for voting) continues to surpass majority voting under all computing budgets, while also achieving a 33% reduction in reasoning time. This is not just a simple speed-up, but also a significant reduction in AI's "carbon footprint" and operating costs while improving performance.

The study further found that this concept of "short thinking" is not only applicable to reasoning, but also to model training. By fine-tuning on a shorter reasoning chain, the model can not only learn a more efficient way of reasoning but also generate shorter thinking chains in the future, while improving model performance and further reducing training costs. For example, by fine-tuning the Qwen-2.5-32B model on the S1-short dataset, its performance improved by 2.8% compared to S1-random, while token consumption was reduced by 5.8%. This shows that the "subtraction philosophy" is becoming a new direction for the evolution of AI.

Chapter 2: Unlocking the new paradigm of “saving money” and “extreme speed”: How to realize this bonus?

Good news for AI’s “big cost spenders”: Will the trillion-level computing power market be rewritten?

For a long time, the inference cost of LLM has been a "Sword of Damocles" hanging over the heads of enterprises. According to an in-depth analysis by AI expert Rahul Rai [4]inference costs account for an overwhelming proportion of the total operating costs of widely deployed LLM, about 90%, while training costs account for only 10%. This means that although training a model requires a huge upfront investment, the real financial burden and continuous "money burning" lies in the actual use of the model.

Fortunately, the "short thinking chain" technology has brought real benefits to enterprises. By improving reasoning efficiency by 33% and reducing token consumption by 40%, this technology is expected to fundamentally change the economic model of large models and release huge market potential. Although the specific market size data varies among different research institutions, general predictions point to an exciting future: the global AI reasoning market is expected to grow from US$106.15 billion in 2025 to US$254.98 billion in 2030, with a compound annual growth rate (CAGR) of 19.2% [5]. Among them, the LLM specialized market is experiencing explosive growth and is expected to grow from US$6.4 billion in 2024 to US$36.1 billion in 2030, with a compound annual growth rate of 33.2% [6].

In this trillion-level computing power dividend, whoever can grasp the key to efficiency improvement first will be able to gain the upper hand in market competition and obtain huge cost advantages and commercial returns.

Who will be the “frontrunner”? Real-time AI application scenarios explode

The low latency and high accuracy brought by the "short thinking chain" are undoubtedly a relief for AI application scenarios that have extremely high real-time requirements. In these scenarios, AI's "slow thinking" or "redundant thinking" may have disastrous consequences.

First, in the field of autonomous driving, the speed of AI decision-making is a matter of life and death. Autonomous driving systems require millisecond-level environmental perception and decision-making. If AI "thinks too much" when identifying road conditions or avoiding obstacles, even a delay of tens of milliseconds may lead to serious accidents. However, the current large model reasoning cost is high and the delay is obvious, making it a huge challenge to deploy real-time decision-making models locally on the vehicle. Studies have shown that although GPT-4 performs well in driving theory tests (with an accuracy rate of over 86%), its cost of use is almost 50 times that of GPT-3.5; while GPT-3.5 failed to meet the passing standard in the same test [7]. This puts the autonomous driving system in a dilemma between performance and cost. "Short thinking chain" is expected to provide autonomous driving AI with "fast, accurate and ruthless" decision-making capabilities, which is a key booster for its large-scale and safe implementation.

Second, in the financial services sector, time is money. Scenarios such as high-frequency trading, risk assessment, and fraud detection require AI to complete market analysis or anomaly identification within milliseconds. The high cost and latency of LLM reasoning are important bottlenecks that limit its in-depth application in the financial sector. “Short thinking chains” can help financial AI systems achieve millisecond-level fraud detection and risk analysis, and seize fleeting business opportunities [8] . For example, Visa’s VisaNet network can process more than 65,000 transaction messages per second [9] , which requires nearly instantaneous fraud detection capabilities.

Furthermore, in the healthcare sector, AI’s response speed is directly related to life. In emergency medical diagnosis, assisted surgery, or patient monitoring, AI’s judgment accuracy and response speed are crucial. For example, real-time medical image analysis requires AI to provide instant diagnostic suggestions during the scanning process. Although existing LLMs can improve accuracy, they are often accompanied by high costs due to hardware optimization [10]The popularization of “short thinking chains” is expected to enable medical AI to achieve sub-second or even faster responses while maintaining high accuracy, becoming a “god assist” for medical staff [11].

In addition, in areas such as smart retail and customer service, the efficiency of AI directly affects user experience and customer satisfaction. For example, Amazon's dynamic pricing engine adjusts prices more than 2.5 million times a day [12], which is inseparable from AI's real-time analysis of supply and demand fluctuations. "Short thinking chain" can ensure that AI chatbots respond instantly, and personalized recommendation systems can more keenly capture user needs, significantly improve user experience, and reduce corporate operating costs [13]. At the same time, as the demand for various AI assistants within enterprises grows, deploying tens of thousands of AI Copilots will face huge inference costs and latency challenges. This technology can significantly lower the deployment threshold and make AI capabilities accessible to every employee within the enterprise.

These application scenarios not only place extremely high demands on technical performance, but more importantly, they are redefining users' expectations of AI systems—from "usability" to "instant response", from "accuracy" to "real-time intelligence". The breakthrough of the "short thinking chain" has undoubtedly laid a solid technical foundation for the outbreak and popularization of these emerging applications.

AI’s Path to Carbon Neutrality: A New Chapter of Social Responsibility with Efficiency Improvement

The efficiency improvement brought by the "short thinking chain" technology has not only significant economic benefits, but also important environmental value. Large language models have long been regarded as "energy-consuming monsters" due to their huge computing requirements, and their carbon footprint has received increasing attention.

However, studies have shown that the largest models (such as codellama-70b and llama3-70b) consume approximately 100 times more energy per token than the smallest models (codellama-7b and llama3-8b) [14] . This striking difference suggests that optimizing model efficiency is critical to reducing AI energy consumption.

The "Short Thinking Chain" reduces the computing power requirements from the source by reducing unnecessary token consumption and reasoning time, thereby directly reducing the energy consumption of AI. This means that while maintaining or even improving AI performance, we can significantly reduce its carbon emissions and help the AI ​​industry move towards a more sustainable "carbon-neutral" path.

Currently, there are frameworks such as the SPROUT framework [15] jointly developed by MIT Lincoln Laboratory and Northeastern University, which has achieved a carbon footprint reduction of more than 40% by guiding the generation process. In addition, the GREEN-CODE framework [16] specifically proposes an energy-efficiency-aware solution for LLM-based code generation tasks. Further studies have shown that combined optimization strategies such as quantization technology, model pruning, and efficient GPU utilization can achieve a reduction of up to 70% in inference costs and carbon emissions [17].

This is not only a boon for the economy, but also an important step for the AI ​​industry to assume its social responsibility and move towards green development. As AI is increasingly integrated into all aspects of society, every improvement in efficiency contributes to building a cleaner and more sustainable intelligent future.

Chapter 3: Ordinary people and AI: How do we coexist with “abnormally smart” AI?

When AI becomes "more like human intuition": disruptive changes in interactive experience

When AI learns the "subtraction philosophy" of "less is more", it will no longer be the "slow" AI assistant, but an intelligent partner that can "understand" your intentions in seconds. AI with faster response and more accurate judgment will make our smart devices and AI assistants more "intimate" and "considerate" in daily interactions.

Imagine: you don't have to wait for a long time for the chatbot to reply; the smart home system can understand your complex instructions instantly; the AI ​​recommendation system can more keenly capture your preferences and provide accurate suggestions before you realize it. This extreme low latency and high accuracy will greatly improve the user experience and make AI truly seamlessly integrated into our lives, as if it has "intuition", making the interaction natural and smooth, just like talking with a telepathic partner.

Should we be wary of AI’s “superficial efficiency”? — Beware of the potential risks of “cutting the Gordian knot”

However, every technological leap is accompanied by deep thinking and potential risks. When AI makes faster and less “thoughtful” decisions, even if it is more accurate empirically, we must remain highly vigilant. We can’t help but ask:

  1. Is AI’s “intuition” reliable?  Is it possible that this efficient “fast thinking” is merely “superficial efficiency”? In its pursuit of speed, will AI ignore certain key long-tail information, leading to unexpected errors or biases in rare or special cases? One of the most famous examples of “shortcut learning” is the green trap of cow recognition: when a deep neural network is trained with images of cows in a typical environment (usually on green grass), it may not recognize the common features of the cow itself, but simply establish an association between the green background and the cow. As a result, when the image recognition software is faced with a cow on a blue background, it is likely to fail to recognize it [18] , or even mistakenly identify a cat in front of a green wallpaper as a cow. Another shocking example comes from a study by the Massachusetts Institute of Technology (MIT) [19], which found that large language models seemed to be good at giving directions when navigating New York City, but when researchers made simple changes to the map (such as adding a detour), the model would “collapse spectacularly” , exposing that they did not really understand the city’s geography or routing principles, but only performed complex surface pattern matching.
  2. Will the “black box” nature of decision-making be exacerbated?  When AI’s thinking path is extremely compressed, will the explainability of its decision-making process be further reduced? If AI makes “fast, accurate and ruthless” decisions in important areas (such as medical diagnosis and financial credit), and we cannot understand its reasoning, how can humans hold it accountable or correct it? Research emphasizes that understanding the decision-making process of LLM is difficult due to its “black box” nature, which is crucial to gaining public acceptance and trust [20]. This opacity becomes even more dangerous in fast decision-making scenarios because even AI researchers find it difficult to understand how LLM decisions are made.
  3. The "atrophy effect" of human cognitive ability?  When we get used to AI's "instant understanding" and "quick thinking", will human beings' own ability to think deeply, critically analyze, and even have patience in facing complex problems gradually degenerate? Research from the University of Southern California warns [21] that AI may weaken human experience in many aspects. One particularly worrying threat is the weakening of the ability to make thoughtful decisions. When AI deprives people of the opportunity to practice making thoughtful and defensible decision-making processes, a rapid decline in human thinking ability will come unexpectedly. This coincides with the conclusion of a study in Nature magazine [22] : When the use and dependence on AI increases, it will automatically limit the thinking ability of the human brain, resulting in a rapid decline in human thinking ability, pushing us to think like algorithms without understanding their principles.

These issues remind us that AI's "subtraction philosophy" is not just a technical optimization, but also a profound social experiment. It challenges our definition of intelligence, efficiency, decision-making and even human beings themselves. While embracing the huge opportunities it brings, we need to be alert to its potential "cutting the Gordian knot" risks and actively explore how to balance the efficiency and reliability of AI, and how to ensure that the advancement of AI can truly enhance human welfare rather than lead to the "degeneration" of our cognitive abilities.

Conclusion: A new era of AI evolution driven by efficiency - a profound intelligent dialogue

From "thinking more" to "thinking less", from "brute force" to "refined", the reasoning paradigm of large language models is undergoing a profound transformation. This is not only a huge leap in technical efficiency, but it has also brought tangible "computing power dividends" to the global AI industry, lowered the threshold for AI popularization, and spawned countless innovative application scenarios.

Looking deeper, this victory of "short-term thinking" is a sign of AI's maturity - it begins to learn how to understand and solve problems more efficiently and fundamentally. This provides us with a new perspective to understand the direction of AGI evolution. Perhaps true wisdom does not lie in infinite complexity, but in the precise grasp of the core.

"Emergency Convergence Point" has always believed that the power of science and technology can promote social progress. Standing at the threshold of this new era of efficiency-driven AI evolution, we see a more inclusive and dynamic intelligent future.  This profound dialogue on "efficient intelligence" has just begun, and we look forward to thinking and exploring with all readers to meet the opportunities and challenges brought by AI.

References
[1] 

The cost of generating a 500-word response with GPT-4 is about 8.4 cents:  https://www.getmonetizely.com/blogs/ai-pricing-how-much-does-ai-cost-in-2025

[2] 

"Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning":  https://arxiv.org/abs/2505.17813

[3] 

"When More is Less: Understanding Chain-of-Thought Length in LLMs":  https://arxiv.org/html/2502.07266v1

[4] 

In-depth analysis by AI expert Rahul Rai:  https://www.youtube.com/watch?v=dfCxbLAMz44

[5] 

The market size is expected to grow from $106.15 billion in 2025 to $254.98 billion in 2030, with a compound annual growth rate (CAGR) of 19.2%:  https://www.marketsandmarkets.com/Market-Reports/ai-inference-market-189921964.html

[6] 

Large Language Model LLM Market worth $36.1 billion by 2024, growing at a CAGR of 33.2% by 2030:  https://www.globenewswire.com/news-release/2024/04/09/2860128/0/en/Large-Language-Model-LLM-Market-worth-36-1-billion-by-2030-growing-at-a-CAGR-of-33-2-Report-by-MarketsandMarkets.html

[7] 

GPT-3.5 failed to pass the same test:  https://dl.acm.org/doi/pdf/10.1145/3691555.3696825

[8] 

Business Opportunity:  https://www.rohan-paul.com/p/reducing-llm-inference-costs-while

[9] 

Visa’s VisaNet network can process over 65,000 transaction messages per second:  https://gcore.com/blog/real-time-ai-processing

[10] 

Although existing LLMs can improve accuracy, they are often accompanied by high costs due to hardware optimization:  https://www.nature.com/articles/s41598-025-00724-w

[11] 

Become a "god assist" for medical staff:  https://www.pluralsight.com/resources/blog/ai-and-data/llms-real-world-applications

[12] 

Amazon's dynamic pricing engine adjusts prices more than 2.5 million times a day:  https://gcore.com/blog/real-time-ai-processing

[13] 

Reduce enterprise operating costs:  https://gcore.com/blog/real-time-ai-processing

[14] 

Research shows that the largest models (such as codellama-70b and llama3-70b) consume about 100 times more energy per token than the smallest models (codellama-7b and llama3-8b):  https://arxiv.org/html/2407.16893v1

[15] 

SPROUT framework:  https://aclanthology.org/2024.emnlp-main.1215.pdf

[16] 

GREEN-CODE framework:  https://arxiv.org/html/2501.11006v1

[17] 

Combining optimization strategies such as quantization techniques, model pruning, and efficient GPU utilization can achieve up to 70% reduction in inference costs and carbon emissions:  https://www.dtclai.com/blogs/news/reduce-ai-inference-costs-sustainability-net-zero

[18] 

Image recognition software will most likely fail to recognize a cow on a blue background:  https://dps.de/en/news/shortcut-learning-the-coming-disaster-for-ai/

[19] 

MIT research:  https://www.ibm.com/think/news/mit-study-evaluating-world-model-ai

[20] 

The study highlights that understanding the LLM decision-making process is difficult due to its “black box” nature, which is critical to gaining public acceptance and trust:  https://arxiv.org/html/2401.12273v2

[21]

USC study warns:  https://dornsife.usc.edu/news/stories/the-hidden-risk-of-letting-ai-decide/

[22]

Research conclusions from Nature magazine:  https://www.nature.com/articles/s41599-023-01787-8