Understanding DeepSeek Open Source Week in one article, CEO Liang Wenfeng wrote the code himself?

Written by
Audrey Miles
Updated on:July-14th-2025
Recommendation

Explore DeepSeek Open Source Week and uncover the secrets of AI acceleration!

Core content:
1. FlashMLA: Data processing tool optimized for Hopper GPU
2. DeepEP: Communication accelerator for hybrid expert models
3. DeepGEMM: Large-scale matrix operation optimization library

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Monday - FlashMLA

FlashMLA is a data processing tool optimized for Hopper GPUs that can efficiently decode and process various data. For example, when we need to feed a large amount of text or data information to an AI model, FlashMLA can quickly process the data, avoiding bottlenecks in the processing process and allowing AI to learn and reason more quickly.

If AI is a reading expert, then FlashMLA is a super magnifying glass! It allows AI to find key information faster when processing long texts. It is very valuable for chatbots and intelligent translation .

Tuesday - DeepEP

DeepEP is a communication tool designed specifically for the Mixed of Experts (MoE) model. It can efficiently transfer data between different AI modules, ensuring that they can communicate and collaborate quickly, thereby accelerating the training and reasoning process of AI models. It not only supports low-precision operations, but also enables efficient communication between GPUs to ensure the smooth operation of the entire system.

The AI ​​model is like a basketball team, where each player is responsible for different tasks, but if the players can’t pass the ball smoothly, the game will be lost. DeepEP solves this problem - allowing AI “teammates” to pass information faster and collaborate more tacitly. 

Wednesday - DeepGEMM

DeepGEMM is an optimization library for accelerating large-scale digital computations, especially matrix operations. It simplifies the computational process and avoids redundant computational steps, making it more efficient when processing large amounts of data. Especially when dealing with tasks that require high-precision calculations, DeepGEMM can provide excellent performance, ensuring that the calculation speed far exceeds conventional methods.

One of the core capabilities of AI is mathematical calculations. DeepGEMM is equivalent to a "supercomputer", allowing AI to perform complex calculations faster and more energy-efficiently.

Thursday - Optimizing parallel strategies

DualPipe: An efficient algorithm that allows computing and data transmission to proceed simultaneously, reducing the waiting time between the two. Just like setting up dual lanes on a highway, computing tasks and data transmission can be carried out in parallel, greatly improving the efficiency of AI model training.

EPLB:  V3/R1  's expert parallel load balancer is responsible for ensuring that the load between different computing nodes (GPUs) is balanced during the training of the AI ​​model. This prevents some computing nodes from slowing down the overall progress due to excessive burden, thereby ensuring that each computer can operate efficiently and the performance of the entire system is more stable.

Imagine that you are driving and chatting with your friends at the same time, and the information is transmitted simultaneously. However, many AI models must "wait until the calculation is completed" when calculating. The optimization strategy developed by DeepSeek allows AI to transmit data while calculating , improving efficiency and reducing resource waste.

Friday - 3FS, accelerating DeepSeek data access in all aspects

3FS is an efficient distributed file system that helps AI models quickly read and store large amounts of data. It uses modern SSD storage and RDMA networks to make data access faster and more reliable, ensuring that there are no bottlenecks when processing large-scale data.

In addition, DeepSeek also released Smallpond, a lightweight data processing framework built on DuckDB and 3FS, designed to provide high-performance data processing capabilities.

AI needs to store and read a lot of data, such as massive videos, pictures, texts, etc. 3FS is equivalent to an "ultra-high-speed cloud disk" , allowing AI to access data faster!

DeepSeek open source address : https://github.com/deepseek-ai

From the Readme of DualPipe, we can see that among the developers of DualPipe is the name of DeepSeek CEO Liang Wenfeng.

Although we have been using DeepSeek's traffic recently, we have been working hard to publish useful information in the official account! We will never write articles that are too superficial. We must not let you down for your attention and support! You are also welcome to follow our official account so that you can see our latest sharing at any time!