Facing the MoE and inference model era: Alibaba Cloud big data AI product upgrade release

At the Alibaba Cloud 2025 AI Potential Conference, Wang Junhua, Vice President of Alibaba Cloud Intelligence Group and Head of Alibaba Cloud Intelligent Computing Platform Division, delivered a keynote speech entitled "Paradigm Evolution: Challenges and Responses in the Era of MoE & Reasoning Models" and released a series of major product capability upgrades for the big data AI platform.

Wang Junhua believes that from the development of Generative AI to today's Agentic AI, large models can complete more and more logical reasoning and planning tasks. In the future , AI needs to be connected with business data platforms and break the boundaries with the physical world , so that AI can truly serve everyone. To this end, Alibaba Cloud's big data AI platform continues to innovate, adapting to the computing paradigm changes brought about by new trends such as MoE architecture, Reasoning Model, and Agentic RAG. A number of big data and AI products have been upgraded to help enterprise customers efficiently build AI models and implement AI applications.

Artificial Intelligence PAI: Supports MoE model reinforcement learning, launches KV Cache Storage and PD separation reasoning services

As the MoE (mixed expert model) architecture has become the focus again, the paradigm and challenges of model training have also been upgraded accordingly. Alibaba Cloud's artificial intelligence platform PAI platform is equipped with the self-developed large-scale MoE mixed precision training engine PAI-FlashMoE and the high-performance reinforcement learning framework PAI-Chatlearn , which supports the rapid configuration of training tasks such as SFT, PPO, and GRPO. The MFU of the 10,000-card scale MoE architecture training reaches 35%-40%, helping users to efficiently and flexibly perform reinforcement learning and SFT fine-tuning, ensuring the efficiency and stability of the training process.

In the model inference stage, the model online service platform PAI-EAS has a load-aware PD separation architecture, combined with the MoE distributed inference scheduling engine Llumnix , which can significantly improve the inference speed and resource utilization, reduce the first token generation response time by 92%, and increase the end-to-end service throughput by 91% . At the same time, PAI-EAS launched a high-performance KV Cache caching service , which can increase the KV Cache hit rate of tens of millions of active users by more than 10 times, and greatly improve the throughput efficiency based on the 3FS storage system.

At this conference, PAI released a new model weight service, which can significantly shorten the model cold start loading and expansion loading time. PAI-Blade LLM introduced mixed precision quantization capabilities, which selects the algorithm strategy with the best accuracy layer by layer during calibration, and dynamically selects the optimal computing mode during inference, thereby achieving the best balance between accuracy and speed.

OpenSearch: Building AI search based on inference models and Agentic RAG

In addition to the core advantages of the PAI platform, Alibaba Cloud has also accelerated the transition of AI productivity through a multi-product matrix. For example, OpenSearch launched Agentic AI search, which is based on multiple agents such as autonomous planning, search, clarification, and summarization. It can connect to multiple knowledge base sources and systems to achieve in-depth search of complex content, increase search recall rate by 13%, and reduce hallucination rate by 42% .

From Copilot to AI Agent, Alibaba Cloud Big Data AI Platform fully embraces MCP

Based on the MCP protocol, Alibaba Cloud announced the release of MCP Server for the big data development and governance platform DataWorks and the real-time data warehouse Hologres, and launched the DataWorks Agent service, allowing big data computing and development governance work to move from Copilot assistance to the AI Agent era .

Hologres MCP Server, the real-time data warehouse , is the only product currently included in the MCP official warehouse of Alibaba Cloud . It supports querying metadata (Schema, table, etc.) in Hologres, executing SQL, viewing query logs, etc. through large models . It works with many platforms that support MCP to solve problems in data preprocessing, visual interpretation, and scientific reasoning for large models.

Based on DataWorks Agent, users can automate some tasks of data development and data governance on DataWorks through natural language interaction , such as data integration, data development, task operation and maintenance, etc.

In addition, Alibaba Cloud Elasticsearch and Vector Retrieval Service Milvus products are also adapted to the open source community MCP Server.

Intelligence: The next step for big data platforms

Wang Junhua believes that big data platforms are moving from one-stop to intelligent. At this conference, MaxCompute for AI features were significantly upgraded , and large-model data preprocessing can be achieved through MaxFrame , covering a variety of data types such as text and multimodality, greatly improving the efficiency of Data for AI scenarios. At the same time, MaxFrame officially launched the AI Function feature, and users can directly call the simple and easy-to-use programming interface provided in the AI Function to use large models for offline processing of massive data in the table, greatly simplifying the data processing process and improving the quality of processing results.

The DataWorks platform adopts the Data+AI dual-wheel drive model, providing SQL generation, testing and optimization functions to help enterprises conduct data analysis and decision-making more efficiently. Together, the efficiency of data preprocessing and enterprise data value acquisition has been significantly improved. In addition, DataWorks and Hologres fully embrace MCP, marking a major shift from AI-assisted work to autonomous thinking and execution of tasks by AI Agents. This innovation will greatly accelerate the application of AI technology in all walks of life and provide enterprises with more intelligent, automated and trustworthy solutions.