Nvidia's mainstream GPU servers and models available for sale in China

Get an in-depth understanding of NVIDIA GPU servers and master the models and core architectures available in China.
Core content:
1. Four major categories of NVIDIA GPU servers and their application scenarios
2. Core technologies and performance characteristics of each series of GPU servers
3. Analysis of models and customized solutions available in the Chinese market
NVIDIA GPU server classification and core architecture
NVIDIA's GPU servers can be divided into four categories based on application scenarios and technical architecture, and their design goals all revolve around "maximizing computing power density."
1. DGX Series: Benchmark of Computing Clusters The DGX series is a high-performance complete server developed by NVIDIA, designed for large-scale AI training and supercomputing. Representative models include: • DGX Station A100/H100 : A single machine supports 4-8 GPUs, and multi-card interconnection is achieved through NVLink, which is suitable for small and medium-scale model training. • DGX A100/H100 : Integrates 8 A100 or H100 GPUs, with a total memory capacity of 640GB (H100), supports multi-machine cluster expansion, and is often used for training trillion-parameter models such as GPT-4. • DGX GB200 NVL72 (latest model): Based on the Blackwell architecture, a single cabinet integrates 72 GB200 GPUs, with a total memory capacity of 13.5TB, and is optimized for the next generation of large language models, but cannot directly enter the Chinese market due to US export control restrictions.
2. HGX modular server: flexible solution for OEMs HGX is a modular design standard provided by NVIDIA to partners (such as Inspur and Huawei), which supports manufacturers to customize hardware configuration according to their needs. For example: • HGX H100/A800 : Adopts Hopper or Ampere architecture, connects GPU via PCIe or SXM interface, and is compatible with a variety of CPU and storage solutions. • HGX H20 : China-specific version, with video memory increased to 96GB, but computing bandwidth is limited, and performance is between A800 and H800.
3. OVX Server: Dedicated to graphics and reasoning for scenarios such as the metaverse and real-time rendering, the OVX server is equipped with an L40S graphics card (Ada Lovelace architecture), with 48GB GDDR6 video memory and 846GB/s bandwidth, and is good at processing generative AI reasoning and 3D modeling.
4. MGX platform: Modular future MGX supports hybrid deployment of CPU (such as Grace), GPU (such as H800) and DPU, suitable for enterprise-level private cloud and edge computing, and Chinese users can customize it through compliant models.
Key GPU parameters and domestic alternatives
Core parameter comparison
A800 | |||||
H800 | |||||
H20 | |||||
Technical compromises of the China-specific version To comply with US export controls, NVIDIA launched a "performance downgrade" solution for the Chinese market: • Bandwidth restriction : The NVLink bandwidth of A800 is reduced from 600GB/s of A100 to 400GB/s, and the interconnection bandwidth of H800 is reduced from 900GB/s of H100 to 600GB/s. • Computing power adjustment : The FP32 computing power of H20 is only 40% of that of H100, but the video memory capacity is increased to 96GB, which is more suitable for inference scenarios.
Available Server Models in China and Procurement Recommendations
Legal Purchase List
A800/H800 server : Provided by OEM manufacturers such as Inspur and Alibaba Cloud, the computing power of a single card is about 70%-80% of the international version, meeting the needs of medium-scale AI training. H20 server : Optimized for large model reasoning, it has a significant advantage in video memory capacity, but needs to be paired with more GPUs in parallel to make up for the shortcoming in computing power. OVX Server (L40S) : Supports generative AI and real-time rendering, suitable for metaverse content production and video processing. MGX customized server : can be equipped with compliant GPUs (such as H800) to build private clouds, suitable for data-sensitive industries such as finance and healthcare.
Thinking about alternative paths • Short-term solution : Purchase A800/H800 clusters and improve efficiency through distributed training frameworks (such as Horovod). • Long-term strategy : Promote the compatibility of domestic GPUs (such as Ascend and Moore Threads) with the software ecosystem of NVIDIA architecture to reduce technology dependence.
The solution to independent computing power
Although Nvidia has maintained its presence in the Chinese market through special models, its performance limitations have forced the domestic industry chain to accelerate innovation. Scientific research institutions and enterprises need to balance "international procurement" and "domestic substitution" and make breakthroughs in areas such as model compression and mixed precision training in order to seize the initiative in this computing power competition.