Why is there only one DeepSeek in China?

Written by
Clara Bennett
Updated on:July-17th-2025
Recommendation

In-depth analysis of the unique phenomenon in China's large model industry, revealing the secret of DeepSeek's success.

Core content:
1. The polarization phenomenon in China's large model industry
2. The comparison between DeepSeek's technological breakthroughs and traditional large model companies
3. How to replicate DeepSeek's success path

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


After the DeepSeek storm, what changes will happen to China's big model startup circle?

In recent days, the author has also discussed with many people in the industry and found that there are currently two extremes in the domestic large model circle: one is extreme enthusiasm, and the other is extreme frustration.

The former is represented by computing power manufacturers and model service providers that actively embrace the DeepSeek ecosystem, as well as open source beneficiaries who were originally unable to participate in the big model "arms race", while the latter is mainly represented by other Chinese big model startups (commonly known as the "Six Little Tigers of Big Models") and the VCs that have invested in these companies in the past two years, forming a "two-faced" situation.

It is understood that some VC teams that have invested in large model companies with first-tier valuations in the past two years have begun to prepare for/are accepting internal "whippings". The main angles of questioning are nothing more than the following:

  • "Why can DeepSeek train such a powerful model at such a low cost, while the big model companies we invested in have raised billions of dollars but cannot do it?"

  • "The reason why DeepSeek can stand out this time is that its technology is innovative and powerful enough. Why should we invest in XXX when it doesn't even have a basic large-scale model technology team?"

  • "XXX also has a very strong team of talented people, and has experience and aspirations in training large-scale models. Why didn't it become DeepSeek? What supports such a high valuation?"

  • “After DeepSeek comes out, who will invest in the big model six tigers? Which ones have the hope of going public? If not, should we buy back or exit next?”

" Why didn't we become DeepSeek ?" and " Why is there only one DeepSeek in China ?" are questions that almost all big model practitioners and VCs have been asking since the Spring Festival. These two questions can almost cover all the anxiety about big model innovation in China. Only by seriously discussing these two questions can we answer another more important question: How to become DeepSeek?

In an article published during the Spring Festival , " The DeepSeek Phenomenon is just the beginning of the rise of Chinese AI " , we have tried to convey a message to the industry from the perspective of the comparison of AI innovation between China and the United States: Chinese AI needs to have national pride; and in this article, we hope to combine the development history of China's big models in the past four years to further explore:

  • Does China lack technological idealists like DeepSeek?

  • If China is not lacking in such technical teams, have such teams been fully tapped and received corresponding social and systematic support? If not, what is the reason?

As an industry account that has been following the big model reports since the outbreak of GPT-3 in 2020, this article does not intend to answer such a macro and profound question, but only presents some facts or opinions that may be related to the topic from a third-party perspective.


1

Systematic Misalignment

Before 2023, there were only four large model companies in China: Zhipu, Mianbi, Shenyan and Lingxin (later acquired by Zhipu), all from Tsinghua University; after 2023, the number of large model startups increased to more than a dozen. From a technical point of view, the direct reason was Llama open source, but the most fundamental reason was that everyone thought at that time:

Although the technical threshold of big models is high, it is not impossible to imitate. Especially based on the existing open source big models, the technical difficulty is further reduced, and the argument that "technology cannot constitute a commercial barrier" is rampant.

Under the "rule" of this collective consensus, we review the dynamics of several forces in China's large-scale model entrepreneurship after the explosion of ChatGPT in 2023, and it is not difficult to understand the current deformed phenomenon in the middle of China's large-scale model entrepreneurship:

First, as the market's awe of technological innovation has weakened, after ChatGPT became popular in 2023, among China's first batch of big model technology pathfinders, only Zhipu became the darling of capital , breaking the 20 billion yuan valuation mark and entering the first echelon of big models. (Dark Side of the Moon was established after 2023, so it is not included)

The other two startups that came out of Tsinghua University's Natural Language Processing Laboratory (THUNLP) faced a lot of challenges and were not as popular in the capital market as the new forces that came later.

Especially Mianbi Intelligence (because Shenyan chose to focus on products), as the first company in China to propose to make a "civilian version of the large model", the company with the most similar technical vision and innovation direction to DeepSeek, and even established earlier than DeepSeek, until the end of 2024, after completing a RMB 300 million financing, its valuation was less than RMB 3.5 billion, which is far from the RMB 20 billion threshold of the first echelon.

According to the exchanges between Leifeng.com AI Technology Review and more than 50 big model investors in the past two years, there are several main reasons why Zhipu and Mianbi, both of which graduated from Tsinghua University, have the same technological first-mover advantage and excellent young technical talents, are so different:

First, the Tsinghua academic school that pursues the base model only bets on one company because they "have reservations about professors starting their own businesses"; second, Zhipu's vision is easier to understand. When it raised funds from outside in the early days, it said "benchmarking OpenAI" and VCs immediately understood it. However, because Menbi emphasized the optimization of the underlying model training efficiency from the beginning, it was once considered to be an "AI Infra" company similar to Luchen and Silicon Base in 2023 when there was the most hot money.

Facing Wall Intelligence did not receive much money in 2023 and could not invest in large base models. Through training with large base models like DeepSeek V3, it can intuitively feedback the importance of efficient training. In 2024, it can only go for small end-side models, and the latter's endorsement effect on "efficient training" is far less good than that of DeepSeek V3.

When raising funds in 2022 and 2023, Mianbi used the banner of "efficient training" to raise funds, but was almost always rejected by VCs.

Secondly, and based on the premise that there is no awe of technology in the general environment, after the big model wave came in 2023, China's AI technology VC did not actually settle down to study the AGI technology. Instead, in order to get to the table quickly, they invested money in "serial successful entrepreneurs who have won battles", even if these teams had no experience in big model research and development before.

Among them, the most typical representatives are Wang Huiwen's Light Years Away and Wang Xiaochuan's Baichuan Intelligence .

Among the big model companies with a current valuation of more than 20 billion yuan, only Tang Jie of Zhipu, Yang Zhilin of Dark Side of the Moon and others started exploring the technology of big models in 2020 when big models had not yet become popular. Most of the teams of Baichuan Intelligence, MiniMax and Jieyuexingchen started after 2023.

For example, Yan Junjie, the founder of MiniMax, is a computer vision expert, and the big model initially solved language intelligence (multimodality is another chapter). However, MiniMax first gained capital favor by relying on its product Glow, rather than the underlying big model technology, so this is another dimension, and people close to Yan Junjie all say that he is "very technically minded."

DeepSeek's R&D team also started learning big model technology from scratch, studying papers and working hard on experiments. Therefore, there is no indication that a team that has never trained a big model before cannot make up for its technical shortcomings through hard study after 2023. However, judging from the industry development in the past two years, Baichuan Intelligence has obviously fallen behind in the base model and can only turn to big models in the medical industry.

Since Baichuan does not train large base models, its R&D costs are lower than other companies and its cash flow is abundant - but this is only beneficial to Baichuan and does not contribute to the entire large model industry.

Assuming that resources are limited, a team without technical capabilities occupies a large amount of capital resources, while a team with technical capabilities can only get very few capital resources. The systematic mismatch of money and talent is destined to produce only regrets and no future.

If AGI big model technology really has no room for growth and the technical barriers of each company have been gradually leveled, then the strategy of competing for resources and capital in the Internet era may be able to get the last piece of the pie. However, entrepreneurs who have awe for technology always keep a clear mind and can still see the shortcomings of the underlying algorithms and architectures of existing big models in training and reasoning, and know that AGI still has many specific and difficult problems to solve.

In other words, the continuous innovation of underlying technologies is still the moat of big model companies, and the Internet methodology of pure resource competition is not applicable to the current development of big models in China. ——But these words are unlikely to be recognized by most Chinese technology VCs, because big model investments in 2023 and 2024 will even have a "Club Deal" game...

In the development of large models in the past two years, a VC who is unwilling to learn technology may be more destructive than a R&D person who is unwilling to learn technology.

The bubble will eventually end. When the tide recedes, it will become clear who is swimming naked.


2

AGI Corps is a rare opportunity

Another impact of the market’s lack of respect for technology is that in order to cater to the market (and of course to break through the encirclement of large manufacturers), in the past two years, China’s large-scale model startups have shifted their focus from long-term AGI to short-term commercial collection and product polishing.

This change in strategy is also due to the above-mentioned industry misjudgment that big models are no longer innovative. Entrepreneurs who are determined to pursue AGI must take into account both business and technology, while teams that are skeptical of AGI or completely confused by market voices will either give up pre-training and turn to C-end applications or simply fine-tune industry big models based on open source models.

It took two and a half years from GPT-3 to ChatGPT, but the market generally shows a "rule" : it only takes two years for domestic large models to go from foundation to commercialization. Although some large model companies can stick to the "L2" and "L4" two-step approach at the same time, no company can be as pure as DeepSeek in terms of investment in talent and research resources for AGI.

When the financing war just started in the first half of 2023, an analysis in the industry was that after the "baptism" of the previous generation of AI companies, China's VCs' commercialization patience for large-scale model companies has been shortened from 5 to 8 years to within 3 years. This may be the general dilemma of China's large-scale model companies.

As we all know, DeepSeek focuses on AGI research, relying on the original reserve funds of Liang Wenfeng and Huanfang Quantitative, and has not raised any external funds. " We have our own money, so we don't need to listen to the outside world, we can do whatever we want ." - This is also what many large model companies envy of DeepSeek.

Recently, Zhu Xiaohu, who originally criticized AGI, changed his tune and said that because DeepSeek was willing to invest in AGI, it could be said that DeepSeek used its strong technical strength to change the VC's view. However, a more cruel reality is that a large number of teams with strong innovative capabilities may fall on the eve of the era because they cannot raise money.

"Commercial thinking" is not only reflected in the shadows of some technology VCs, but also in the selection of R&D talents.

According to the feedback from headhunters, in 2024, the company that spends the most on talent in China is undoubtedly ByteDance. The division between large companies and entrepreneurial teams has already been formed, and the flow of large-model talents from entrepreneurial teams to large companies has become a common choice in the past year. For example, according to AI Technology Review, many outstanding talents in NLP, multimodal and reinforcement learning that DeepSeek was looking for to work on AGI chose ByteDance over DeepSeek.

According to a headhunter who served DeepSeek in the early days, DeepSeek also hoped to poach top talents from overseas teams such as Google, Meta, and OpenAI, but the progress was not smooth, so it had to settle for the second best and train its own talents.

In addition to money, investment in AGI also requires people, a group of absolute technical idealists, and excellent organizational culture. DeepSeek's success may not be replicated, but from V2, V3 to R1, R1-Zero, DeepSeek's technical results reflect its advantages in funding, talent/ideals, and organizational culture.

Before DeepSeek, "Bei Jiukun and Nan Huanfang" were already well-known in the field of financial quantification, and the high requirements of the quantitative industry for technical talents are also well known. Basically, the team size is small, but the ability is super strong, based on the top 2 universities and gold medalists in informatics competitions. According to AI Technology Review, the team size of DeepSeek in the first half of 2024 was only more than 40 people, and most of them were technical experts from the original top 2 of Huanfang.

Continuing the style of the original Magic Square, DeepSeek has always had a very high recruitment threshold. For example, they started looking for technical experts in multimodal and reinforcement learning in mid-2024, but after recruiting for more than half a year, the relevant positions are still vacant, and they would rather have no one than hire the wrong person . After R1 became popular, the number of resumes submitted increased dramatically, but according to people familiar with the matter, "there are not many suitable ones."

The organizational culture within DeepSeek is also very flat. According to AI Technology Review, there is only one boss in both Beijing and Hangzhou: Liang Wenfeng, the founder of DeepSeek. "Liang Wenfeng and his subordinates are basically all employees."

In addition, Liang Wenfeng's personal style is also very obvious: he has a strong belief in technology, is very curious and eager to learn about AGI, and is very hardworking. An insider close to Liang Wenfeng described him as "speaking very, very slowly, thinking for a long time before expressing each sentence, and expressing himself very concisely. Although concise, his words are often to the point."

The team culture of DeepSeek is very similar to that of companies like Yushu and Momenta: the top people are all technology enthusiasts who have a natural awe and curiosity about technology; at the same time, the management style is obviously centralized and the culture is flat, so when difficulties are encountered in technological exploration, resources can be coordinated from top to bottom to quickly achieve the effect of uploading and downloading.

At the same time, Yushu and DeepSeek also have their own set of standards when recruiting, which is very different from the stereotyped interview routines on the market. Interested readers can go and learn more.

Liang Wenfeng of DeepSeek started exploring how to train stronger models at a lower cost very early on, when the industry generally did not understand it. Similarly, Wang Xingxing of Yushu also started working on four-legged robot dogs when people still did not understand robot dogs, and Cao Xudong of Momenta also started working on L2 and L4 at the same time when the autonomous driving industry was generally obsessed with L4, walking on two legs.

An entrepreneurial team that dares to go against the mainstream needs a strong rebellious spirit. In the exchanges between AI Technology Review and many investors, this kind of "rebellion" is easily classified as "young people", but in my opinion, the confidence of rebellion ultimately comes from a team's cognition, judgment and technical confidence in the social problems they want to solve , that is: firmly believe that their direction of progress is the future and will bring huge value.


3

Innovative taste

After V2 set off a price war, Liang Wenfeng commented on this technological achievement in an interview with Undercurrent: " Among the many innovations that happen every day in the United States, this is a very common one. "

After V3 and R1, Liang Wenfeng has not made any public announcements, but for DeepSeek and Liang Wenfeng, before fully realizing AGI, perhaps the innovations of V3 and R1 are just "very ordinary ones." This does not deny the breakthroughs and merits of the latter two, but teams that want to highlight their high aspirations often say that a 100-point thing is 80 points, and always pursue additional points .

After R1 was released, a senior reinforcement learning scholar in the industry told AI Technology Review: " After replacing the RL+SFT paradigm with a pure RL algorithm, I think the realization of AGI will be at least three years away. "

Sam Altman said that AI will surpass humans in 2025, and Musk also said that AGI can be achieved by 2026 at the latest. ——In the various predictions of "AGI time points", although it is difficult for us to judge when it will happen, we can feel that such a general trend is happening.

The trend is known, and DeepSeek's success has made everyone realize at least two facts: first, AGI technology has not yet reached its ceiling, and second, China's technology team is capable of making innovations that lead the world in AGI. Compared to immersing ourselves in DeepSeek's victory, how to promote the development of China's AGI in the future is more important.

In the past half month, the DeepSeek storm has brought new changes to the perception of AGI development among large companies, startups, computing power manufacturers, investors, etc. Some elephant-like problems that were ignored in the past have been re-emphasized, and some old views in the past have been overturned. But the consistent change is that everyone realizes that at this stage, the realization of AGI still requires idealism.

Rather than guessing what OpenAI or DeepSeek will do next, it is more important to infer what technical problems AGI needs to solve. In other words, innovation is more important than imitation.

In fact, according to interviews with AI Technology Review over the past year, in addition to DeepSeek, there are also many AI talents in China who continue to innovate and propose new solutions to solve unresolved problems. Here are just a few:

Professor Ma Yi, Dean of the Institute of Computing and Data Science at the University of Hong Kong, has been emphasizing in the past two years that the large models currently trained by high computing power have knowledge, not intelligence. Different from the black box nature of deep learning, Ma Yi's team has been committed to researching explainable and controllable artificial intelligence algorithms and frameworks (white box theory). (For more information, please read previous reports in AI Technology Review: " Ma Yi, HKU: The "knowledge" of large models is not equivalent to "intelligence" now " )

At CNCC 2024, Zhipu's Tang Jie mentioned the next development of multimodal technology. Since 2021, the Zhipu team has begun to explore multimodal large models. According to the Zhipu team, they encountered similar problems in their early explorations: when multimodal data such as text, images, voice and video are simultaneously fed into the training large model, the data of one modality seems to weaken the knowledge/intelligence of another modality. Although multimodality is a trend, there is still a lot of room for research on how to optimize cross-modal data alignment, collect high-quality data, and enhance the common sense and reasoning capabilities of multimodal models. (For more content, please read previous reports on Leifeng.com: " A Brief History of Wudaokou's Large Model " )

According to the communication with several founding members of the Wall-Facing team in March 2024, the current mainstream large model architecture is actually unable to solve several key problems well, making it difficult to get close to AGI: such as experience learning and spatial memory. For example, people can become more proficient by learning one thing many times, or quickly become familiar with a new environment and effectively transfer the cognition of another problem to the new environment. These problems are not easy to express with the current Transformer. (For more content, please read the previous report of AI Technology Review: " The Underestimated Wall-Facing: Creating a Scaling Law Curve that is not inferior to OpenAI ")

With the development of embodied intelligence, AGI will naturally be divided into cloud AGI and edge AGI. Edge AGI refers to a model that can naturally perceive the environment and perform high-level reasoning, and can make complex multi-step decisions based on high-level reasoning. The popular embodied brain and brain are developing along this trend, and there are still many problems to be solved in this direction. To solve these problems, in addition to resources, we must also have strong technical strength and technical vision.

After the release of o1, many studies in the field of large models began to move towards reasoning, but according to rumors: Google's Gemini team has recently completed a new generation of basic models and opened it to a small number of users for testing.

Although Google's stock price plummeted in 2023 after being beaten by OpenAI, if we look at Google's big model technology from June 2020 to 2022, we can find that Google's big model approach is to build a system from the bottom up from the underlying computing power, architecture to the upper-level algorithm. This may also be an important reason why Google Gemini was able to make a big push later.

The same is true for DeepSeek. According to DeepSeek's technical disclosure, its path to researching large models is also to penetrate from the underlying Wanka cluster and HAI framework upwards to build an interlocking technical system.

Only by being vigilant against authority, always working backwards from the essence of the problem, and firmly innovating can we lead the trend. Short-term quick money may flow to the lucky ones, but long-term resources should flow to those teams that are good at applying resources to the best.

I hope that by 2025, there will no longer be only one DeepSeek in China.