DeepSeek is the savior

DeepSeek technology leads the energy revolution in the AI field, helping companies reduce consumption and increase efficiency.
Core content:
1. Analysis of the reasons behind the surge in carbon emissions from technology giants
2. How DeepSeek technology reduces AI computing power energy consumption
3. Outlook on the impact of DeepSeek on the future development of the AI industry
Not only Google, but Microsoft also revealed in May last year that its carbon dioxide emissions had increased by nearly 30%. The core reason for the increase in carbon emissions from technology giants is attributed to one thing: the energy consumption of artificial intelligence models, hardware, and data centers is rising sharply. It can be said that if the era of AI arms race arrives, humans are likely to fall into an energy crisis that they have never foreseen.
But things have changed dramatically recently.
Recently, Microsoft canceled the decision to build two data center projects in Kenosha, Wisconsin and Atlanta, Georgia, involving adjustments to hundreds of megawatts of power capacity. In addition, according to TD Cowen's latest research report, Microsoft has terminated lease agreements with multiple private data center operators and suspended some international capital expenditure plans.
This series of measures not only reflects the structural changes taking place in the field of AI infrastructure construction, but is also interpreted by the market as an important signal of a cooling of the AI investment boom. TD Cowen analysts pointed out that the core of Microsoft's strategic adjustment this time is to cope with the new normal of "oversupply" in the industry.
With the recent general pullback of AI concept stocks, to some extent, it highlights the change in capital's attitude towards the valuation of large models. The technological impact caused by the "catfish" DeepSeek has proposed a new thinking path for the future technological evolution of large models.
With fewer chips and lower training costs, DeepSeek has changed the computing power expectations of AI companies while also greatly reducing the energy consumption that AI computing power may bring in the future. Half-jokingly speaking, if humans do not fall into an energy crisis due to the growth of AI computing power, DeepSeek is likely to be the "first contributor".
How does DeepSeek reduce energy consumption?
Li Bojie, founder of Lingtan Intelligence, believes that DeepSeek reduces training costs in four main ways:
The first is pipeline parallel optimization (DualPipe) : by designing the staggered execution of forward propagation and back propagation (such as 1F1B and its expansion strategy), the complete overlap of computing tasks and data communication is achieved, thereby maximizing GPU utilization and shortening the "bubble" time caused by waiting during training.
Second, the redundant expert strategy load balancer (EPLB) : Under the MoE (hybrid expert system) architecture, the workload of different experts may be seriously unbalanced. EPLB shares the load for some experts by replicating busy experts, avoiding some GPUs from being idle for a long time, thereby making more efficient use of hardware resources.
The third is FP8 mixed precision training: Compared with the traditional mixed precision training using FP16/FP32, DeepSeek v3 "extensively uses 8-bit floating point numbers for training." This greatly reduces the consumption of memory and computing resources, not only reducing the hardware resources required for training, but also indirectly reducing energy consumption and related electricity costs.
Fourth, multi-token prediction (MTP): by generating multiple tokens at a time, the information utilization rate is improved, which can not only improve the efficiency of training and reasoning, but also help shorten the overall training time and reduce computing resource consumption.
Specifically in terms of GPU energy consumption, according to Li Bojie's calculations, DeepSeek pre-training consumed approximately 2.66 million GPU hours, and the reinforcement learning (RL) stage consumed approximately 0.5 million GPU hours, totaling approximately 3.16 million GPU hours.
Assuming that the H800 GPU is used, the power consumption of such data center-level GPUs is usually between 500W and 700W. Taking the average value of 600W, each GPU hour consumes 0.6 kWh of electricity. The final total energy consumption ≈ GPU hours × average power consumption per hour, the result is about 1.9 GWh of electricity consumption.
For comparison, we take GPT-4 Moe as an example to calculate its GPU power consumption. According to Huang Renxun's speech at GTC2024, GPT-4 Moe used 8,000 H100 GPUs for 90 days of training, with a total GPU card time of 17.28 million GPU hours. The design power consumption (TDP) of the H100 GPU is 500-750W, and the average value of 600W is taken for calculation, and the final total energy consumption is about 10.4GWh, which is 5 times that of DeepSeek.
The Huxiu ESG team once pointed out in an article titled "The AI Revolution is an Energy Disaster" that the average household's annual electricity consumption is about 1,000 kWh, which means that the electricity saved by DeepSeek is enough for nearly 10,000 households to use for a year.
Li Bojie pointed out that the reason why large artificial intelligence models are so energy-consuming is that the current mainstream LLM model uses a deep neural network algorithm based on the transformer architecture. The algorithm of this architecture processes data through a self-attention mechanism and considers different parts of the sequence or the entire context of the sentence to generate predictions about the model results.
"The advanced LLMs on the market usually contain trillions of parameters. The more parameters there are, the more complex the model is, and the more computational effort is required during training."
DeepSeek has significantly reduced computing costs while improving model training efficiency by optimizing training models (introducing a multi-head potential attention mechanism) and launching a new reinforcement learning algorithm, GRPO. Ultimately, its model training cost is only 1/10 of OpenAI’s, and its usage cost is only 1/30 of OpenAI’s.
According to the International Energy Agency, in 2022, 2,700 data centers in the United States consumed more than 4% of the country's total electricity, and it is expected that by 2026, the power consumption of global data centers may double. However, judging from the "cost reduction frenzy" triggered by DeepSeek, the power consumption caused by data centers may not be so huge.
Huge potential for reducing indirect energy consumption
Compared with the significant reduction in training energy consumption brought about by technological innovation, the carbon reduction space indirectly brought about by DeepSeek's technological innovation is equally huge.
From the perspective of market competition, facing the "cost reduction" pressure brought by DeepSeek's technological innovation, many AI companies are accelerating the elimination of inefficient model architectures , the most typical of which is meta.
Meta CEO Mark Zuckerberg called 2025 "the decisive year for AI" and expected Meta AI to become a leading assistant serving more than one billion people. But facing the new challenges brought by DeepSeek, Meta has fallen into panic mode.
The Information reported earlier this year that Meta's AI team leadership, including Matthew Oldham, director of AI infrastructure, has recently worried that the emergence of DeepSeek means that Meta is falling behind in the AI race. They are particularly afraid that Llama, the next-generation flagship AI released by Meta this quarter, will not perform as well as DeepSeek. According to two Meta employees cited by The Information, Meta has set up multiple "war rooms" or professional research groups to analyze DeepSeek and use its insights to improve Llama.
As for Musk's Grok 3, after stacking 200,000 H100 graphics cards, its score test is indeed higher than that of OpenAI and DeepSeek. However, such a large amount of computing power brings greater energy consumption. According to the calculation of Dr. Taosha, a financial V, the carbon emissions of a single training of Grok3 are equivalent to the annual emissions of 46,000 cars. In terms of unit energy efficiency, DeepSeek R1 is 67% lower than Grok3.
Even if Musk is rich, he can't just take shortcuts and continue to "waste". Microsoft's support for DeepSeek and Meta's research on DeepSeek algorithms are enough to show that DeepSeek is subverting the AI giants' model of piling up computing power and chips, and moving towards refined operations.
In addition to the market competition, from the perspective of upstream and downstream supply chains, DeepSeek also taught the energy industry a lesson. After DeepSeek went viral, the day before New Year's Eve, the share price of energy supplier Constellation Energy fell 21%, and the power company Vistra fell 28%.
"DeepSeek has reset the competitive environment between China and the United States in the field of artificial intelligence. More importantly, it has fundamentally subverted the energy sector." Wesley Alexander Hill, assistant director of the Energy, Growth and Security Project at the International Tax and Investment Center, said in a signed article published in Forbes magazine that the basic assumption on which many countries around the world have based their energy policies, that artificial intelligence will inevitably drive growing demand, no longer exists.
Finally, DeepSeek can also perform amazingly in empowering traditional energy companies. For example, for chemical companies, by analyzing production data (such as reaction parameters and equipment status) in real time, process conditions can be adjusted dynamically. A vertical self-media pointed out that in a methanol distillation unit, after optimizing process parameters through a model, steam consumption can be reduced by 15%, product yield can be increased by 8, and the overall equipment efficiency (OEE) can be increased by 12%.
Sinopec has also recently reported that its DeepSeek-R1 model has completed verification testing on imported GPUs and domestic GPU platforms, and its inference computing efficiency has nearly doubled. In the future, DeepSeek can be applied in Sinopec's core areas such as seismic data processing, reservoir development optimization, and chemical product research and development.
In summary, DeepSeek has created clear energy-saving cases in training and enterprise-level applications through technological disruption, cost reconstruction, and open source ecology. There is still huge room for energy conservation and carbon reduction in the future.
Wider social benefits
Based on this analysis, DeepSeek has already performed well in the "E (environment)" of ESG, but DeepSeek also has excellent case applications in "S (society)" and "G (corporate governance)".
At the "G (corporate governance)" level, China Business News recently published an article pointing out that with the access of technology giants such as WeChat and Baidu to DeepSeek, and the launch of DeepSeek-R1 intelligent models based on full-stack domestic technology in government systems across the country, the demand for computing power has increased dramatically. As DeepSeek expands its presence in multiple fields, the demand for computing power in society will continue to grow in the future.
At the "S (social)" level, according to The Paper, some towns in Xingye County, Yulin City, Guangxi Province, have used DeepSeek for poverty prevention monitoring. "Through DeepSeek, the data of all the poor households in the town are dynamically analyzed to accurately identify families at risk of returning to poverty, and automatically generate assistance suggestions. The analysis efficiency is 50% higher than that of traditional methods."
From this perspective, DeepSeek's potential boosting value in various dimensions of ESG undoubtedly still has considerable room for development, and it is undoubtedly a "hidden giant in the city."
The current artificial intelligence may be just one side of DeepSeek. The instrumental rationality reflected in its efficiency-first approach and the inherent logic of incorporating environmental friendliness, social equity, and corporate governance into technological evolution will bring more surprises to mankind in promoting sustainable social development.