Understand in one article: Why do we need DPU in the AI era?

The computing power revolution in the AI era, how does DPU break the bottleneck of traditional architecture and release the potential of intelligent computing center?
Core content:
1. The demand for DPU in the construction of intelligent computing center in the AI era
2. The network processing capability, security and storage function of DPU
3. The computing power release, performance improvement and security isolation brought by the innovation of DPU architecture
The architectural dilemma in the AI era
With the exponential demand for computing power in large model training, the limitations of the traditional von Neumann architecture are becoming increasingly prominent. When models such as ChatGPT distribute workloads to thousands of GPUs, sudden gradient data transmissions cause network congestion, forming the third bottleneck in addition to the "computing power wall" and "memory wall". Data shows that under the traditional architecture, about 10% of the computing power of the server CPU is occupied by infrastructure tasks such as network protocol processing, storage management, and security verification, resulting in a waste of resources.
For example, if you buy a 100-core CPU, but you can only use 90 cores, where do the other 10 cores go? They run a bunch of data center software, security, storage, management, etc. Then the cost of these extra 10 cores is a waste. It is equivalent to spending 100 yuan, but only 90 yuan is used for computing. Therefore, a role that specializes in doing the dirty work is needed - DPU.
DPU, or data processing unit, has powerful network processing capabilities, as well as security, storage and network offloading functions. It can release CPU computing power and complete data processing tasks such as network protocol processing, data encryption and decryption, and data compression that CPU is not good at. It can also manage, expand and schedule various resources separately, that is, handle tasks that "CPU cannot do well and GPU cannot do".
DPU Architecture Innovation
DPU upgrades the network from a simple transmission pipeline to a programmable computing node by building "data plane intelligence". Its core value lies in:
1. Computing power release: offload tasks such as network virtualization (VxLAN), storage acceleration (NVMeoF), and security encryption (IPSec), allowing the CPU/GPU to focus on core computing
2. Performance leap: In a 100Gbps network environment, DPU can achieve line-speed processing and reduce latency by more than 90%.
3. Security isolation: Building a zero-trust architecture through hardware-level isolation to prevent attacks from spreading horizontally
The development of DPU can be divided into three stages:
Phase 1: Smart NIC Era (2010-2019)
Mellanox's ConnectX series pioneered network acceleration and achieved simple protocol offloading. At that time, in order to reduce the extra CPU consumption in the data center, the Israeli company Mellanox proposed the concept of Smart NIC ( intelligent network card ).
In a virtualized environment, the CPU has a very heavy workload. It not only has to run OVS ( Open Virtual Switch ) related tasks, but also has to take into account storage management, online and offline encryption and decryption of data packets, deep inspection of data packets, firewall management and control, complex routing and other complex operations. Such high-intensity work severely limits the full use of CPU performance, resulting in it not being able to reach its optimal state.
The advent of Smart NIC provides a new and effective way to resolve the problem of extra CPU resource consumption. With Smart NIC, OVS operations can be offloaded from the CPU. At the same time, it can independently undertake multiple key functions such as storage acceleration, data encryption, deep packet inspection, and complex routing. The transfer of this series of functions has freed up a large number of CPU cycles that were originally spent on processing these workloads and returned them to the host CPU.
This not only effectively solves the conflict problem of CPU resource usage among different businesses and greatly improves the operating performance of various businesses, but also ensures that the server CPU can fully provide powerful processing power for applications, or provide more resource support for virtual machine (VM) services, thereby creating greater value for the enterprise.
Phase 2: DPU First Year (2020)
After NVIDIA acquired Mellanox, it released BlueField-2, defining the "third computing power unit" for the first time.
Seeing the huge commercial value of smart network cards, Nvidia spent $6.9 billion to acquire Israeli chip company Mellanox in March 2019. It also launched the DPU product in 2020. Since then, the concept of DPU has officially entered the public eye.
That year, NVIDIA officially launched two DPU products: BlueField-2 DPU and BlueField-2X DPU.
Phase 3: Architecture Innovation (2021 to Present)
In April 2021, NVIDIA released its next-generation data processor, the NVIDIA BlueField-3 DPU. The BlueField-3 DPU launched by NVIDIA is undoubtedly a groundbreaking product. As the world's first DPU designed specifically for AI and accelerated computing, it has attracted much attention since its launch.
It is worth mentioning that as the industry's first DPU that integrates 400G Ethernet and NDR InfiniBand , BlueField-3 has excellent network performance. It can provide software-defined and hardware-accelerated data center infrastructure solutions for workloads that have extremely demanding performance requirements. From the field of AI to hybrid cloud environments, to high-performance computing and 5G wireless networks, BlueField-3 DPU has demonstrated strong adaptability and excellent performance, redefining the possibilities of data processing and computing in these fields.
However, NVIDIA is not satisfied with the achievements of BlueField-3 DPU. With the continuous emergence and widespread application of large models, improving the distributed computing performance and efficiency of GPU clusters, enhancing the horizontal expansion capabilities of GPU clusters, and achieving business performance isolation on generative AI clouds have become key challenges faced by the entire industry. To meet these challenges, at the end of 2023, NVIDIA launched the BlueField-3 SuperNIC .
BlueField-3 SuperNIC is a product carefully designed by NVIDIA for the special needs of generative AI clouds. It is derived from the BlueField DPU and uses the same architecture as the DPU, but it differs in functions and application scenarios. The BlueField DPU mainly focuses on offloading infrastructure operations, accelerating and optimizing north-south traffic; while the BlueField-3 SuperNIC cleverly draws on advanced technologies such as dynamic routing, congestion control, and performance isolation on the InfiniBand network , while being fully compatible with the convenience of Ethernet standards on the cloud, so that it can better meet the strict requirements of generative AI clouds for performance, scalability, and multi-tenancy, and achieve effective performance optimization of east-west traffic.
In general, NVIDIA's BlueField-3 network platform currently includes two distinctive products. Among them, BlueField-3 DPU can achieve multiple key tasks such as speed-limited processing software definition, network, storage and network security; while BlueField SuperNIC is designed to strongly support ultra-large-scale AI clouds. The two complement each other and together constitute NVIDIA's powerful technical support in the data center field.
Domestic market DPU development trend
In the domestic market, the research and development of DPU is also showing a booming development trend. Taking Zhongke Yushu as an example, its first domestically produced DPU chip K2 was successfully launched in 2022, marking a major breakthrough in the research and development of DPU technology in China. Subsequently, the full-featured third-generation DPU chip K2Pro released in June last year showed powerful performance and broad application prospects. At present, K2Pro has been widely used in many key fields such as ultra-low latency networks, cloud and data centers, financial computing, big data processing, and high-performance computing, providing strong support for the development of domestic data processing and computing technology.
At present, China's DPU industry is still in its infancy . From the perspective of domestic market demand, China has the world's most powerful Internet industry, a large number of Internet users and a rich and diverse online ecosystem. The explosive growth of massive data has greatly stimulated the strong demand for computing power. At the same time, China's attention to network security is increasing day by day. DPU has unique and significant advantages in ensuring network security. From sophisticated data security protection to all-round security protection of data centers, it can achieve seamless coverage and show good development potential, which is also an important prerequisite for the development of the DPU industry in China.