545% profit margin: DeepSeek's counterattack against OpenAI's black technology!

Written by

Silas Grey

Updated on:July-14th-2025

Preface

The open source week lasted for five consecutive days. Just when everyone thought DeepSeek was going to wrap up, it suddenly dropped an easter egg on Zhihu: "Overview of the DeepSeek-V3/R1 Inference System".

Original text: https://zhuanlan.zhihu.com/p/27181462601

Just a quick note: DeepSeek has just joined Zhihu. This is the first and only article, and it has already gained 22,000 fans.

After reading it, I have only one word to say: awesome!

The article directly states the card - the theoretical cost-profit ratio is as high as 545%, which is dozens of times cheaper than OpenAI, and can earn 3.46 million RMB a day.

Below I would like to share with you the main content of this article.

Easter eggs on the sixth day

——How does the inference system squeeze every GPU?

The goal of the inference system that DeepSeek released this time is simple and crude: higher throughput, lower latency, and lower cost.

Expert Parallelism (EP) : For traditional large models, one GPU does all the work. DeepSeek's Expert Parallelism (EP) distributes the model to multiple levels of multiple GPUs for parallel computing. During peak hours during the day, 278 nodes (8 H800 GPUs per node) are fully inferred; when idle at night, they are directly switched to R&D training, maximizing hardware utilization.
Computation and communication overlap : The biggest problem with cross-node collaboration is communication latency. DeepSeek's operation is to allow computation and transmission to proceed simultaneously!

Prefill phase: Two computing batches are executed alternately, one for computing and one for transferring;
Decode stage: It is divided into a 5-stage pipeline, and the communication time is "hidden" in the calculation interval. In short, it makes the GPU never idle.

Load balancing : The most feared thing in the system is that some GPUs are exhausted and some are idle. DeepSeek has designed three sets of dynamic balancers:

Prefill phase: tasks are assigned according to request length to avoid traffic jams caused by long texts;
Decode phase: balance the KVCache memory usage to prevent "memory hogs" from lagging behind;
Expert Load: Automatically copy high-load experts and distribute them to idle graphics cards.

The final throughput of a single H800 is:

For the prefill task, the input throughput is about 73.7k tokens/s (including cache hits);
For the decode task, the output throughput is about 14.8k tokens/s.

Open VS Close

DeepSeek technology is open source in full swing, but OpenAI has given us a GPT-4.5.

It’s just like what others said: OpenAI used to be responsible for technology, and DeepSeek was responsible for high emotional intelligence, but now, it’s the other way around.

Oh, no, GPT-4.5 probably can’t reach the same emotional intelligence as the guy on DeepSeek’s forum.

The following is a collection of recent open and closed source actions for your convenience (refer to the answer from "Lü Ahua" on Zhihu).

In order to grab headlines, OpenAI held 12 consecutive press conferences to launch new products; DeepSeek released a series of open source frameworks for a week in a row to promote ecological development.
In order to reduce losses, OpenAI launched a membership plan of US$200 per month; DeepSeek has always insisted on free service when supply exceeds demand.
OpenAI launched GPT-4.5, which had limited performance improvements but a 30-fold increase in the API unit price; DeepSeek implemented a nightly price reduction strategy.
OpenAI expects to lose $5 billion in 2024; DeepSeek has publicly stated that its profit margins remain high.

Summarize

DeepSeek’s open source technology is amazing, but what’s even more amazing is that it proves that in addition to the “creativity-PPT-storytelling” model, focusing on technology research and development can also lead to success.

Although I am not the one who succeeded, the light of my fellow travelers has illuminated the way forward.

Keep it up everyone!