AI is not a panacea: sober reflections on the technological wave

Written by
Iris Vance
Updated on:June-18th-2025
Recommendation

A sober reflection on the AI ​​technology boom, revealing the limitations and future challenges of big models.

Core content:
1. Reflection on the blockchain and metaverse technology boom and the real dilemma
2. The rise of AI big models and their impact on corporate operations
3. The core constraints of AI big models and future development prospects

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
Before we delve into the issues of AI itself, let’s review the technological waves we have experienced in recent years. Blockchain and Metaverse were once regarded as revolutionary concepts, which attracted the attention of the whole nation and seemed to be able to subvert traditional industries. However, in the end, we found that the implementation of both was limited by key technical bottlenecks and did not really replace the original system.

Take blockchain as an example. It has indeed created profound value in programmable transaction closed loops (such as Bitcoin, DeFi, and stablecoins). However, in scenarios where closed loops cannot be built independently, such as supply chain traceability, anti-counterfeiting tracking, NFTs, digital collections, etc., blockchain is often just a substitute for "unchangeable databases" rather than an indispensable core technology. Performance bottlenecks, high costs, and the lack of credibility of off-chain data have reduced it to "form over substance" in many practical scenarios.

The situation is similar in the Metaverse : its vision is to provide an immersive virtual interactive experience, but the reality technology is difficult to support large-scale applications. Network latency and bandwidth limitations have seriously hindered the development of the Metaverse, making the ideal smooth experience out of reach . Even if the 5G era brings some improvements, there is still a distance to truly meet the Metaverse's requirements for millisecond-level low latency and high synchronization.

The experience of blockchain and metaverse reminds us: if new technologies are to subvert the existing foundation, they must first overcome their own hard technical thresholds , otherwise the craze will eventually return to rationality.

Big model craze: more turbulent than the previous wave, but more sober

From ChatGPT's popularity at the end of 2022 to NVIDIA's market value exceeding $3 trillion, the AI ​​big model craze has lasted for two years, far surpassing the popularity of blockchain and metaverse. It not only affects the way people interact with computers, but also shakes up enterprise software architecture, organizational operations and strategic planning.

So here comes the question:

Is AI a new engine for reconstructing the digital foundation of enterprises, or just a reinforcement patch?

In the discussion of "whether AI can replace BI, ERP, databases and core business systems, and even the entire system architecture", we should break away from technology worship and return to system rationality: the current big model still faces the following fundamental constraints, which determine that it is impossible to replace the traditional enterprise software base in the short term.


First, AI big models are “probabilistic models”, not “deterministic systems”

The current mainstream large models (such as the GPT series) are essentially a probabilistic reasoning system trained after lossy compression of massive data . It does not perform strict deductive reasoning, but makes predictions based on similarity and statistical distribution. Because of this, even if the input is exactly the same, the output results may fluctuate, making it difficult to achieve a stable and consistent response.

The core problem behind this is that the parameter capacity of the model is always limited, while the training data is often measured in trillions of tokens. In order to "accommodate" as much knowledge as possible in limited parameters, the model must be learned in a compressed manner. Many details are blurred or directly discarded during the compression process, so that the model can only restore knowledge in an approximate way, but cannot accurately remember and reproduce it, and it is difficult to guarantee 100% accuracy. This also explains why large models often have so-called "hallucinations" or factual errors when processing specific facts.

In other words, the big model does not "remember everything" but tries to build a kind of "fuzzy knowledge representation space". This makes its reliability questionable in business scenarios with extremely high precision requirements, and it is difficult to replace the traditional deterministic system based on rules or computational logic.

Second, the "universality" of large models comes at the cost of high energy consumption

The reason why the big model is called a "universal model" is that it can handle various tasks across fields, from writing poetry to doing financial modeling. But behind this "omnipotence" is actually an extremely high computing cost. For example, to complete a simple addition with a model of hundreds of billions of parameters, its computing resources and energy consumption are far higher than traditional algorithms or calculators, which is tantamount to "using a cannon to kill a mosquito", a bit like asking Einstein to move bricks - it is not impossible, but the efficiency is extremely low, not as good as ordinary cattle and horses.

The essence of this versatility is the stacking of "heavy computing" capabilities rather than the optimization of efficiency. In practical applications, large models are far less efficient than dedicated systems. Especially when the task is clear and the demand is stable, lightweight professional models or algorithms often have a higher cost-effectiveness.

The cost of this "heavy versatility" is becoming more and more apparent in real-world applications. When the mission objectives are clear and the processes are stable, companies tend to prefer using rule engines, lightweight models, and automated scripts to replace large models. These systems have lower latency, higher certainty, and a more controllable cost structure for specific tasks.

Large models are more suitable for exploratory, creative, and highly context-changing task scenarios, but are not suitable for long-term structured, standardized, large-scale repetitive workflows.

Third, performance degradation under distributed large model architecture

The current large models are highly dependent on the memory bandwidth and computing power of a single GPU during training and reasoning, but the system memory, disk I/O and network communication resources are often unable to match them. Once you try to split the model parameters to multiple nodes through tensor parallelism or expert parallelism, cross-node communication will become a bottleneck.

For example, the memory bandwidth of a single NVIDIA H100 card is as high as 3.35 TB/s, while the network bandwidth of current mainstream data centers is usually only 100-400 Gbps (about 12-50 GB/s). The huge gap in bandwidth means that when nodes communicate frequently, communication delays and throughput imbalances will be rapidly amplified, which will lead to the reverse performance effect of "capacity expansion means slowdown".

In the multi-card inference test of DeepSeek-R1, after the system switched from a single card to an 8-card cluster, the latency soared from 120ms to 980ms, while the overall throughput increased by less than 30%. This result fully demonstrates that large model services cannot achieve linear expansion like traditional microservice components, and there is even a significant performance loss under multi-node deployment.

Fourth, highly coupled computing leads to slow cold start and difficult state migration

The reasoning process of a large generative model is highly coupled: calling the model once requires loading all weights, maintaining a complete context cache, and gradually generating outputs by token. This mechanism has two serious consequences:

  • The cold start cost is extremely high: loading a 70B parameter model often takes several minutes, which is much longer than the initialization time of traditional services.

  • Difficulty in context state migration: The KV cache and execution path states during the reasoning process are difficult to quickly synchronize between nodes. Once interrupted, the recovery cost is extremely high, and it is almost impossible to achieve "second-level switching".


In contrast, the traditional microservice architecture can launch new service instances in seconds through Kubernetes, and database services such as MySQL can also complete cold start within a few seconds, with natural horizontal expansion and elastic operation and maintenance capabilities.

Fifth, system consistency, stability and error amplification

Key business systems such as banks and securities have extremely high requirements for system continuity and transaction stability, and must have a complete transaction isolation mechanism, strong consistency guarantee, and fine-grained permission control. However, the big model does not have these "distributed system core capabilities":

  • It does not support transaction isolation and does not have transaction commit/rollback capabilities;

  • It is impossible to finely control and track users, tenants, and request contexts;

  • Lack of native observability and auditing mechanisms.


This means that once the large model is directly embedded in the core trading process, any delay, illusion, abnormal response or model error may trigger an "avalanche-like" chain reaction, seriously affecting fund security and customer experience.

What is even more challenging is that if the large model is regarded as a "low-level business platform component", each model version upgrade will change its reasoning logic. This will have three consequences:

  • All business links that rely on model output must be re-validated by regression;

  • The consistency and stability of system results are difficult to guarantee;

  • The release process cost is much higher than traditional microservice components, which can easily slow down the business rhythm.


In other words, using large models as the "core component" of the enterprise digital foundation is neither safe nor economical at the current stage . It not only introduces the risk of uncontrollable logic drift, but also greatly increases the complexity of operation and maintenance and architectural uncertainty, becoming an unbearable burden in the enterprise digital system.

Sixth, Structural Delays Caused by Token-by-Token Reasoning

The mainstream large models use autoregressive generation, and the output of each token depends on the result of the previous token, which means that the inference time is linearly cumulative . The longer the output, the higher the delay.

  • GPT-3.5 (output 128 tokens): average latency 300ms~800ms

  • GPT-4 (streaming mode): 1.5~3s

  • Claude 3 Opus: about 1~2s

  • DeepSeek-R1 (single card): 120ms; multi-card inference increases to 980ms


The latency gap between AI large models and traditional high-concurrency systems:

system
Typical response delay
Core bank transfer service
5~20ms
E-commerce order link
10~30ms
Real-time payment system (WeChat/Alipay)
<50ms
Microservice API Call (RPC)
1~10ms
Large model reasoning (GPT-4)
1500~3000ms


This means that if the big model is used directly for main process operations such as ordering, accounting, and deductions, even if other modules can achieve "millisecond-level" results, the delay of the big model will lengthen the entire process, completely dragging down the user experience and system design. If the big model is used for a closed business loop, a business that previously took a few seconds to complete may now take several minutes.

7. Limitations of mainstream large model contexts and challenges of complex enterprise-level tasks

Current large models such as GPT-4 Turbo, Claude 3, Command R+, DeepSeek, etc., already support context windows ranging from 64K to more than 200K tokens, and can theoretically process hundreds of pages of document content. However, in real scenarios, the processing of a complex web page and ERP form may consume thousands or tens of thousands of tokens, and the real complete business tasks including context often far exceed the upper limit of model processing. Especially in scenarios such as ERP configuration, contract approval, and process orchestration, which involve multi-layer nested logic and dynamic status updates, it is difficult to complete a conversation, and even information truncation, one-sided understanding, and chaotic output logic may occur.

A key feature of enterprise-level business systems is high coupling . For example, a business page for an order often needs to be linked with multiple modules such as inventory, logistics, and payment. Any change to a field or page may affect the entire system. In order for the model to consider global logical associations, all relevant contexts need to be included in one-time reasoning, which not only greatly increases token consumption, but also increases the complexity and error probability of reasoning.

Complex situations also include: workflows with highly dependent previous and subsequent steps . For example, in complex approval or form automatic generation tasks, the correctness of the field configuration in the first step will directly affect the process logic in the second step. Each step must be verified before the next round of generation and planning can proceed. This requires the model to have the ability of "stage verification" and "state perception" during execution, and to be able to verify in conjunction with the simulation environment when necessary. Currently, mainstream large models do not have this ability of multi-layer task decomposition, dynamic feedback, and path backtracking.

In addition, complex tasks often require cross-round interactions and multiple rounds of confirmation. For example, generating a complete approval process usually includes multiple stages such as field design, process configuration, and permission setting. This not only exceeds the context window, but also requires the model to maintain state memory and response consistency. Most current large models still lack this aspect.

To address this problem, the industry is trying three types of technical paths:

  1. Multi-agent collaboration : Frameworks such as AutoGPT and OpenDevin split complex goals into subtasks, execute them collaboratively through multiple sub-agents, and build task memory mechanisms to support cross-round calls. Manus attempts to concentrate capabilities in a single agent, but still encounters bottlenecks when the amount of information is large.

  2. RAG mechanism : Dynamically retrieve external knowledge bases to supplement model context. However, RAG can only call limited fragments. If the retrieval is inaccurate or the context combination is inappropriate, there will still be problems of missing information and logical discontinuity.

  3. Memory modules and long-context enhancement : Mechanisms such as Claude's Memory and OpenMemory MCP attempt to persist historical interactions, enabling the model to have cross-task memory and context perception capabilities. At the same time, some experimental studies are also exploring methods such as token compression and topic extraction to improve effective information density.


Despite the continuous evolution of technology, in order to achieve stable "multi-round dialogue + task planning + context retention" in the ToB scenario, engineering means are still needed in the short term, including task decomposition, phased verification, human-machine collaborative design, etc. The most feasible strategy at present is to combine RAG's context supplementation capability, multi-agent task division mechanism, and limited memory state retention to build a "semi-automatic AI collaboration body" with business controllability. A truly one-step fully automatic intelligent system still needs to wait for a dual breakthrough in context understanding capabilities and system architecture.

Summary and thinking: AI does not replace everything, but needs to be properly embedded

It can be seen that due to the above limitations, the big model is not suitable for the stringent requirements of enterprise-level systems for high compliance, high certainty, high performance, high complexity, high coupling, high availability, high concurrency, and low latency . If a large enterprise ERP, e-commerce platform or bank core system completely relies on the big model as the base, not only will the deployment and operation and maintenance costs be extremely high, but it will also be difficult to ensure the distributed capabilities required for business continuity and system stability.

The big model is not the "terminator of enterprise software" but the "cognitive enhancement layer of enterprise systems."

It does not have the performance, consistency, and controllability guarantee capabilities of traditional systems, but it has shown strong empowerment potential in knowledge generalization, language understanding, and generative expression. Therefore, the optimal solution is never to "replace everything with AI", but rather:

Let AI become the "smart interface" of the company's existing systems, embedded between business processes and technical platforms, to achieve an intelligent leap in human-machine collaboration.

If your business scenario does not require extreme precision, coupling, and performance assurance , such as building an internal tool with only a dozen pages, or handling some low-frequency, lightweight, non-critical tasks, then AI's efficiency improvement is exponential, and it is very enjoyable to use. In the past, it may have taken several man-months to complete a system, but now AI can automatically generate pages, write scripts, and build logic, and it can be initially launched in half an hour.

In this AI-driven "demand-side transformation," the real opportunities may not lie in big platforms or big functions, but in those "small, scattered, and weakly connected" scenarios. The real innovation of technology lies not in "tearing down and rebuilding" but in "integration and evolution."