How did the H20 141G version come about?

Written by
Clara Bennett
Updated on:July-01st-2025
Recommendation

The rise of NVIDIA H20 141GB graphics card and HBM3e technology innovation.

Core content:
1. The reasons behind the surge in market demand for H20 141GB graphics cards
2. Analysis of H20 graphics card performance characteristics and market competitiveness
3. HBM3e technology development and its impact on video memory capacity improvement

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Recently, a lot of news about NVIDIA H20 141GB has suddenly emerged in the circle of friends. The market demand for H20 141GB complete machines or modules has skyrocketed, including many large manufacturers, who are stockpiling cards. This phenomenon is largely due to the emergence of deepseek. Everyone is pleasantly surprised to find that now they can deploy large models that meet the standards at a much lower cost than before.  
H20 is a "special edition" product  launched by NVIDIA for the Chinese market . It is based on the NVIDIA H100 chip, but in order to comply with the US export control policy and to be sold in the Chinese market, its performance has been "castrated" by 80%. From the parameter comparison, it can be seen that compared with the H200 we introduced earlier [ What is the difference between H200 and H100 ] , the FP16 computing power of H20 is less than 1/10 of that of H200 , but the video memory is extremely large. The previous H20 is equipped with 96GB video memory, and the video memory bandwidth can reach 4TB/s. Therefore, H20 is a graphics card with extremely distorted computing power and video memory. DeepSeek was not optimistic before it became popular. Coupled with the fierce competition from domestic AI chip suppliers, the market competitiveness of H20 has been greatly reduced, and doubts are endless.
However, as the DeepSeek model has taken a low-cost, high-performance technical route, even the "castrated" H20 has now become a hot seller in the domestic market. The H20 141GB version has a single graphics card equipped with 141GB of HBM3e video memory . If an 8-card system is built, the total video memory capacity is as high as 1128GB, and the video memory bandwidth is 4.8TB/s. Such a powerful hardware configuration can run DeepSeek - R1 full-blooded version (FP8 precision) natively and smoothly, and can easily cope with high-load computing tasks. 
From the parameters, we can see that the only difference between the H20 141 version and the H20 96 version is that the H20 141GB version uses the same HMB3e as the H200. It is reported that the H20 96GB version has been discontinued, so it can be inferred that the H20 141GB version will continue to occupy the market this year.
So the question is, what exactly is HMB3e?

HBM3e

The first HBM product was launched in 2014. HBM1 is the first to use 3D stacking technology, with a bandwidth of 128GB/s and a capacity of 1GB per stack. It uses unique vertical stacking technology and TSV (Through Silicon Via) technology to closely connect multiple DRAM units and form efficient interconnections with GPUs or CPUs, thus building a large-capacity, high-bandwidth DDR combination array.

Since then, HBM technology has gone through multiple generations of development, with significant improvements in capacity, bandwidth, and data transfer rate. For example, SK Hynix has successfully developed multiple generations of products such as HBM2E, HBM3, and HBM3E; Micron launched HBM2 and HBM2E products, and in 2023, it skipped HBM3 and directly launched HBM3E products; Samsung launched HBM2E, HBM3 and other products, and plans to release the first HBM3E product in 2024. HBM3E further improves bandwidth, latency, energy efficiency, and capacity based on HBM3.

The reason why the video memory capacity reaches 141GB

HBM3e uses more advanced 3D packaging technology and vertical stacking process. By increasing the number of stacked DRAM chip layers, higher storage density is achieved in the same physical space. For example, the storage capacity is increased from 8 layers of HBM2 to 12 layers of HBM3. HBM3e may further optimize the stacking technology on this basis, so that the capacity of a single chip can be increased, thereby achieving a significant increase in total capacity.

Taking H200 as an example, its 141GB HBM3e video memory may be achieved through a combination of multiple high-capacity HBM3e chips. It is speculated that H200 uses 6 24GB HBM3e stacks to form a physical capacity of 144GB of memory, but NVIDIA has reserved part of the capacity for production reasons and finally provides users with 141GB of available video memory. In addition, the memory architecture inside the GPU has also been optimized, including the design of data channels and improvements to the memory controller, so that large-capacity video memory can be efficiently managed and utilized.