Recommendation
In-depth analysis of the new development trend of big models and grasp the future direction of AI technology.
Core content:
1. The current situation and future prediction of big model computing power investment mode
2. The development of multimodal and slow thinking technology and its impact on the industry
3. The challenges faced by domestic high-end chips and the expansion of big model applications
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
The continuous improvement of the reasoning model capabilities has pushed the big model to the turning point from "usable" to "easy to use". Reasoning enhancement and application expansion have started a new race in the second half of the big model. The potential of individual intelligent agents has initially emerged, industry applications have gradually deepened, and open source has increasingly become a core competitiveness component of the big model. Large computing power, multi-modality, strong reasoning, wide open source, accurate data, intelligent agents, deep applications, etc. have become important trends in current development.
1. The computing power investment model that works wonders has not yet reached its peak
DeepSeek's low-cost training of 5.57 million US dollars has caused a great shock to the world. However, this has not overturned the underlying logic that large models require large computing power. In fact, the cost of a single training of this model is equivalent to about one-eighth of the cost of similar foreign models, and there is not yet an order of magnitude difference. Its significance is an engineering innovation that reproduces the effects of existing models in a more intensive way. The previous online hype about the huge difference in the cost of large model training in China and the United States is actually comparing the total expenditure of the United States on data center construction, chip purchases, network construction, and scientist salaries at the level of hundreds of billions of US dollars to the single training cost of DeepSeek, which is wrapped in a large amount of exaggeration and emotional factors.It is still an industry consensus to use greater computing power to explore the upper limit of the capabilities of large models. It is expected that GPT-5 and Llama 4-level large models will be launched abroad in the first half of this year. The construction of large computing power clusters in the United States is in full swing. Musk's xAI has built the world's largest 200,000 H100 computing power cluster, and trained the Grok3 large model on this basis. Google is expected to invest US$75 billion this year, a year-on-year increase of 43%, most of which will be used for the construction of computing centers; Meta is expected to invest US$60-65 billion, a year-on-year increase of 53%-66%; Amazon is expected to invest US$100 billion, a year-on-year increase of more than 20%. In addition, Japan's SoftBank Group, OpenAI and the United States Oracle Corporation jointly launched the Stargate Project, which will invest US$500 billion in the next four years to build a super-large computing power infrastructure in the United States. These will promote more breakthroughs in pre-training of large models, and coupled with the popular post-training enhancements such as reinforcement learning, the leap in the capabilities of large models may be further accelerated. Many bigwigs predict that the realization of AGI may be in the next two to three years.The supply of high-end chips is still a bottleneck problem for China's next-generation large models, and it may face the risk of insufficient supply of training chips again. Although the number and capabilities of my country's high-end AI chip companies have increased since last year, Huawei, Suiyuan Technology, Moore Threads, Haiguang, BiRen and many other companies have designed domestic chips that match the performance of Nvidia A100 single cards, but due to TSMC's suspension of 7nm production capacity supply and HBM ban and other restrictions, the manufacturing of domestic high-end chips still faces challenges.2. Slow thinking and multimodality become standard.
Many fields will usher in the AlphaGo moment
The post-training process, which includes reinforcement learning , brings out the potential of the model accumulated in pre-training, and the slow thinking of the model brings about a significant improvement in reasoning ability. Stimulated by the DeepSeek effect, large model companies at home and abroad are accelerating the launch of the next generation of large models, such as OpenAI's basic large model GPT-4.5 and reasoning model o3; Anthropic integrates deep thinking and fast output hybrid reasoning model Claude 3.7; Google has successively released Gemini 2.0 and the more powerful reasoning model Gemini 2.5 Pro, as well as xAI's Grok 3. Tencent Hunyuan in China has released a strong reasoning model T1 that can respond in seconds, combining fast and slow thinking, and for the first time applying the hybrid Mamba architecture losslessly to super-large reasoning models, significantly reducing the cost of training and pushing. DeepSeek updated a version of the model called DeepSeek-V3-0324, which achieved a score higher than GPT-4.5 on the mathematics and code related evaluation sets.Multimodality is the original appearance of the human world. The development trend of large models must be towards multimodality, expanding from single text, images, videos, 3D to various modes such as sound, light, electricity, and even molecules and atoms, to achieve the understanding and generation of the real world. Native multimodality is the future direction. The recently released Google Gemini 2.0 Flash can edit pictures in one sentence, which is comparable to the editing effect of professional Photoshop software; GPT4o's latest stylized text-to-picture capabilities are popular on the Internet. Tencent's newly open-sourced Hunyuan 3D model supports both text-to-picture 3D and picture-to-picture 3D. It can change skins and animations with one click, and generate 3D game videos with one click.With the leap in model capabilities, it is foreseeable that more fields will usher in the "AlphaGo moment", that is, the capabilities of large models in various fields exceed more than 90% of the industry, or even the highest level of people. OpenAI's O1 scored close to full marks in the American Mathematical Invitational Competition, and surpassed the accuracy of doctoral level in benchmark tests of physics, biology and chemistry problems. Anthropic CEO Dario recently predicted that AI will be able to write 90% of the code in the next 3-6 months.3. Model open source and open protocols become new competitive components
The debate between open source and closed source that the industry had been arguing about before has now leaned towards open source. DeepSeek's popularity is partly due to open source. The MIT License agreement it adopts supports full open source, has no restrictions on commercial use, and does not require an application, allowing developers around the world to have the opportunity to use and evaluate it, and relying on word-of-mouth effects to quickly form a global influence. OpenAI, which was originally determined to follow the closed-source route, was also forced to consider open source. Ultraman recently publicly stated that the closed-source strategy previously chosen may have stood on the wrong side of history. At the same time, it is also publicly soliciting open source solutions on social platforms, and plans to make large open source models on the end side and o3 mini-level open source models in the future.Meta abroad, Tencent, Alibaba, Zhipu and other domestic companies have started the open source strategy very early. For example, the Hunyuan Wensheng graph model is the industry's first Chinese native DiT architecture Wensheng graph open source model; Wensheng video big model is currently the largest video open source model, and it is fully open source, including model weights, reasoning code, model algorithms and other complete models. Foreign communities such as Hunging Face have also become important platforms for global big model developers to be active. Hunging Face has brought together 1.52 million open source big models and 337,000 open data sets.Equally important is the open protocol of large models, which can be compared to the HTTP protocol when the Internet was emerging. HTTP allows all kinds of web pages to be displayed in a unified format in the browser, making it easier for users to obtain information. The open data communication protocol of large models can make it easier for large models to call various tools and complete various tasks autonomously. For example, the recently popular MCP (Model Context Protocol) is a model data communication protocol released by Anthropic in November last year, which has become a bridge between large models and various tools.4. In the “post-truth” era, it is urgent to build a credible big model
For the first time, the impact of technology on knowledge and information has extended from the dissemination and interaction links to the production link. The accuracy and professionalism of the knowledge output of the big model, that is, the "credibility" of the big model, is becoming the core competitive indicator of artificial intelligence.While large models bring a great deal of information, noise problems such as hallucinations contained in the content also trouble users. A study by Columbia Journalism Review found that the generative artificial intelligence models used for news search in the United States have serious accuracy problems. Researchers tested eight AI search tools with real-time search functions and found that more than 60% of news source queries were wrong.The illusion problem of big models is inherent in the underlying technology path of artificial intelligence. It is the other side of the same coin as the innovation capability and is difficult to completely solve by relying solely on technology. Introducing authoritative books, magazines, news information, papers and other high-quality content data, and creating a new, "credible" knowledge consensus mechanism and supply system are the key to the big models generating greater value in the future in the fields of production and life applications.OpenAI signed a five-year contract with News Corporation last year, obtaining authorization to access historical content from the group's media, including mainstream media such as The Wall Street Journal, Barron's, The Times, and The Daily Telegraph, in order to enhance the credibility of the answers provided by the large model.Tencent Hunyuan is cooperating with excellent traditional publishing institutions such as Encyclopedia Publishing House, People's Medical Publishing House, Shanghai Cihai Publishing House, and Chemical Industry Publishing House to support them in launching book intelligent entities and exploring a trusted large model cooperation model based on search enhancement technology. For example, in the Yuanbao APP application square, the People's Medical Intelligent Entity can provide users with authoritative answers in specific medical knowledge fields such as cardiovascular and cerebrovascular diseases, while providing citations to the original texts of related books, and can be directed to the e-book reading platform and jump to the physical book purchase page. It can not only realize the migration of traditional knowledge traceability mechanisms such as footnotes, endnotes, and literature indexes, and ensure the consensus and accuracy of the output knowledge, but also bring a sustainable win-win model for publishing institutions and large model platforms.In the future, whoever can access more trusted data sources and build a trusted evaluation and consensus mechanism will gain a leading advantage in the era of human-machine content co-creation.5. Personal applications under the logic of intelligence + Internet
Expected to start the Matthew effect
The successive release of basic large models such as GPT-4.5, DeepSeek V3, Tencent Turbo S, and inference models such as OpenAI o3, DeepSeek R1, and Tencent T1 indicates that the basic large models have evolved to a usable stage, and have promoted personal applications to a new starting point for development.In the past, personal applications were not rich enough because of the limited capabilities of the basic large model. The results in complex problem analysis, multimodal generation and understanding were not satisfactory, and users were not surprised enough when using the large model. Moreover, the data of personal applications is more about usage preference data, which cannot feed back to the improvement of the intelligence of the basic large model. Therefore, in the past, spending money to buy traffic and user applications failed to build a moat, and the cost of users replacing applications was low, and the stickiness was insufficient.With the current relatively mature capabilities of basic large models, the platform effect that has been the foundation of the success of the mobile Internet in the past is expected to play a role again. More users using AI applications can accumulate more high-quality shared knowledge, more user feedback and social interactions, etc., so that applications can be continuously optimized and attract more users to use, forming a virtuous positive cycle. Taking Tencent Yuanbao as an example, after adopting the strategy driven by DeepSeek+ Hunyuan dual model engine, the number of users has risen sharply, and the DAU (daily active users) increased by more than 20 times from February to March this year.The leading advantage of Chinese applications is expected to be further exerted. Productivity tools that improve efficiency are becoming increasingly powerful, and the experience of companionship and entertainment applications that kill time is constantly being optimized. According to the report on the top 50 generative AI applications in the world released by a16z investment institution in March, 11 applications from Chinese companies were listed, while only 3 products were listed in August last year, with a significant growth rate. AI new search, literary image/video tools and role-playing applications are the top three hot topics.But at the same time, personal application innovation still faces the "bitter lesson" ( The bitter les son ) , that is, people repeatedly try to improve performance by engineering means, but in the end they are always surpassed by the simple method of stacking computing power. The continuous improvement of the capabilities of the big model will "eat up" many functions of application innovation, especially workflow applications, which are more easily replaced by the new capabilities of the big model. How to deepen the moat in the application requires more first-principle thinking, so as to embed the key nodes of the user decision chain to enhance value, increase the user's emotional identification, and enhance its own irreplaceability through ecological synergy. It can be said that technology iteration is the spear, scene penetration is the shield, and ecological synergy is the soil. Sometimes personal applications have to run faster and wait for the improvement of the big model capabilities, and sometimes they have to think about the evolution path of the big model more slowly to build a "dynamic capability combination" of technology + scene + ecological synergy.6. The end point of personal AI applications is super intelligent assistants
The upgrade of basic large model capabilities brings about the unlocking of application depth capabilities. The first wave of large models represented by ChatGPT are good at dialogue, giving rise to applications such as AI new search Perplexity. The second wave of large models represented by Claude 3.5 Sonnet are good at programming, which has promoted the popularity of Cursor, which is valued at tens of billions of dollars, and popular programming star Devin. The third wave of large models represented by Open AI o1 are good at deep reasoning, making agent applications possible. In particular, with the continuous breakthroughs in multimodal and reinforcement learning technologies, the effect of the model has been greatly improved and the cost has been continuously reduced. It can be foreseen that intelligent agent applications will accelerate their penetration into more vertical fields, ushering in a new era of human-machine collaboration.A new era of intelligent agents is coming. The recent popularity of Manus applications in China has given the industry more expectations for the future of AI agents. Coincidentally, OpenAI's autonomous computer-using agent Operator and deep research agent Deep Research have started commercial trials, moving from the laboratory to the mass market. According to foreign media reports, OpenAl plans to sell low-end agents to "high-income knowledge workers" at a price of $2,000 per month; mid-range agents charge $10,000 per month for software development; and high-end agents charge $20,000 per month as doctoral-level research agents. According to Gartner, by 2028, 33% of enterprise software applications will include agent-based artificial intelligence, while this figure will be less than 1% in 2024, and at least 15% of daily work decisions will be made autonomously through artificial intelligence agents. The market for artificial intelligence agents will grow significantly, from $5.1 billion in 2024 to $47.1 billion in 2030.The deepening of Agent applications will drive the consumption of Tokens to grow by a hundred times or even higher, which will bring about a greater explosion in the demand for inference computing power and exceed the demand for training computing power. In order to improve energy efficiency and reduce costs, large cloud computing and large model manufacturers such as Google, Amazon, Meta, and OpenAI are accelerating the layout of customized ASICs. ASICs are gradually developing into an important new technology route besides NVIDIA GPUs. Morgan Stanley predicts that the AI ASIC market size will grow from US$12 billion in 2024 to US$30 billion in 2027, with a compound growth rate of 34%. At the same time, the widespread use of Agents will require models to be able to handle larger-scale contexts, which will also bring greater challenges to the improvement of the basic capabilities of the models.7. Intelligence as a Service is the ultimate direction for the industry to land
By using the cloud, intelligence can be turned into a service that can be called on demand by thousands of industries, which will eventually form a new form of intelligence as a service . In the past, we measured economic development and digitalization levels by electricity consumption and cloud usage. In the future, we may need to measure the level of intelligence by " token usage" .The popularity of large models such as DeepSeek has brought about a comprehensive upgrade of model effects and stimulated a new wave of embracing large models in various industries in China. However, at present, there is a certain gap in the application of generative AI between Chinese and American companies. Most of the applications of Chinese companies are in the experimental stage and are still far from large-scale use. However, the application of American companies is more extensive and in-depth. In 2024, the proportion of American companies fully implementing generative AI reached 24%, which is significantly higher than China's 19%. The US government and companies generally use public clouds to deploy AI, support rapid iteration of AI, and more than 70% of organizations use cloud AI. Driven by this, the cloud computing revenue of large American companies in the latest quarter has grown rapidly, such as Microsoft reaching US$40.9 billion, a year-on-year increase of 21%; Amazon is US$28.786 billion, a year-on-year increase of 19%; Google is US$11.96 billion, a year-on-year increase of 30%.High cost-effectiveness is driving industry applications to go deeper. In the more than two years since ChatGPT was released, the performance of large models has continued to improve, and the cost of inference has dropped significantly. For example, the API call price of GPT-4o is $20 per million output tokens, which is two-thirds lower than when it was released. At present, the price of DeepSeek V3 in China is 8 yuan per million tokens, and the price of the mixed multimodal large model TurboS is as low as 2 yuan per million tokens. While the model's capabilities have been greatly improved, it also provides a high cost-effectiveness for large-scale deployment in various industries. In the past two months, the implementation of industry large models has achieved remarkable results. More than 30 industries such as government affairs, finance, medical care, education, media, culture and tourism have been implemented. While greatly improving efficiency, the original process is also being reconstructed. Companies including Shenzhen Bao'an Government Affairs, Shenzhen Medical Insurance, Shanghai Xuhui Urban Transportation Center, Shenzhen University, Ruijin Hospital, Shanghai Pharmaceuticals, Chongqing Rural Commercial Bank, and Honor are actively deploying and exploring large model applications. Taking the Shenzhen Bao'an government affairs big model application as an example, it has covered 31 business scenarios such as people's livelihood demands, enterprise services, government office, social governance, etc., covering nearly 30,000 government service knowledge in 14 fields and 20 industries in the district, integrating more than 60 model capabilities, and can quickly deploy new intelligent applications based on business scenario needs.In industry applications, high-quality data is the moat for improving efficiency. Industry big models need high-quality data within the industry and enterprises more than ever before, because industry applications require more accurate and professional knowledge and have zero tolerance for illusions. Investment in data governance will achieve twice the result with half the effort. However, this often requires a lot of investment and is often considered hard work and tiring work. It is the most easily overlooked part of industry implementation.In the future, big models will not only develop in depth in various industries, but will also achieve a three-dimensional evolution of deep applications through cross-domain collaboration, inclusive small and medium-sized enterprises, and social system reconstruction: from "scenario adaptation" to "value creation", big models are upgraded from efficiency tools to business growth engines; from "information islands" to "ecological integration", cross-domain data collaboration promotes the expansion of application boundaries; from "enterprise-level applications" to "social system reconstruction", technology penetration has entered the deep water zone, triggering all-round changes in corporate and social organizational models, employment and distribution structures, and social ethical norms.