Huawei Ascend DeepSeek All-in-One Machine In-depth Disassembly

Written by
Audrey Miles
Updated on:June-27th-2025
Recommendation

The deep integration of Huawei Ascend AI chip and DeepSeek large model brings a new breakthrough in domestic AI computing power.

Core content:
1. Technical details of Ascend DeepSeek all-in-one machine, including chip process, computing power and energy efficiency optimization
2. Modular and distributed design of system architecture, software and hardware coordinated optimization
3. Product form and application scenarios, model support and performance of training and push all-in-one machine

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 


Ascend DeepSeek all-in-one machine is an AI solution based on Huawei's self-developed Ascend AI chips (such as Ascend 910B/910C) and the deep integration of DeepSeek large models, aiming to provide a high-performance, low-cost, and domestically produced AI computing platform. This article analyzes the all-in-one machine in detail from the dimensions of technology, products, architecture, specifications and performance, price, application scenarios, customization, and industrial ecology.

1. Technical details of Ascend DeepSeek all-in-one machine

The core competitiveness of Ascend DeepSeek all-in-one machine comes from the deep collaboration between hardware and software.

Ascend 910B/910C chip technology:

Technology and computing power:

The 910B uses a 7nm process, with an FP16 computing power of 280 TFLOPS and an INT8 computing power of 140 TOPS. The 910C is further optimized to the SMIC N+2 process, with FP16 increased to about 320 TFLOPS, which is close to 60%-70% of the performance of NVIDIA H100.

Energy efficiency optimization:

Through dynamic voltage frequency scaling (DVFS) and handwritten CUNN kernels, power consumption is reduced to about 250W (910C), which is significantly more energy-efficient than H100 (700W).

Heterogeneous computing support:

Integrates AI Core (based on Da Vinci architecture), AI CPU and DVPP modules, and supports multi-tasking parallelism.

DeepSeek model optimization:

MoE Architecture:

DeepSeek uses a sparse mixture-of-experts architecture, activating only a small number of parameters (about 4%) per token, which doubles the inference efficiency.

DualPipe algorithm:

By overlapping computing and communication, cross-node communication overhead is reduced to near zero, and training a 671B parameter model only requires 2048 H800 GPUs, which takes two months.

Software stack adaptation:

MindSpore and CANN are deeply optimized to support seamless conversion from CUDA to CUNN, reducing developer migration costs by 80%.

Ascend 910C introduces handwritten CUNN kernels (similar to CUDA's PTX instructions), optimizes matrix multiplication for the Transformer model, and reduces inference latency from 10ms to 6ms.

DeepSeek improves the accuracy of complex tasks (such as mathematical reasoning) through the multi-head latent attention (MLA) mechanism, with an inference throughput of 500 tokens per second.

2. Ascend DeepSeek All-in-One Machine System Architecture

Ascend DeepSeek all-in-one machine adopts modular and distributed design:

Hardware layer:

Core: Ascend 910B/910C + Kunpeng 920 CPU.

Storage: NVMe SSD (single-unit capacity up to 16TB).

Network: RoCE v2 (200Gbps bandwidth), supports ultra-large-scale clusters. The RoCE network uses the non-uniform Bruck algorithm, which improves cluster communication efficiency by 50% and reduces the proportion of network costs to less than 20%.

Software layer:

The MindSpore framework provides model training and fine-tuning tools.

The CANN software stack optimizes operator scheduling and improves inference efficiency by 30%. CANN supports ACL interfaces, allowing developers to customize high-performance operators to meet specific industry needs.

Distributed computing:

Supports multi-card parallel operation (8/16/32 cards) and achieves efficient communication through the HCCL library.

3. Product form of Ascend DeepSeek all-in-one machine

Ascend DeepSeek all-in-one machines are divided into two product lines:

Training and Push Machine (FusionCube A3000 DS version):

Supports training and reasoning of DeepSeek V3 (671B parameters) and the full range of R1 models.

FusionCube supports modular expansion, which can be expanded from 8 cards in a single machine to 1024 cards in a cluster. The training efficiency increases linearly with the scale.

For customers who need customized models, such as financial risk control and medical research and development.

Inference All-in-One Machine (Atlas Series):

Built-in DeepSeek-R1 models of different scales (32B, 70B, 671B).

The Atlas 300I Pro inference card consumes only 150W of power per card and supports real-time analysis of 80 channels of 1080p video.

It focuses on efficient reasoning and is suitable for edge and cloud deployment.

4. Specifications, performance and configuration of Ascend DeepSeek all-in-one machine

Specification:

Single card: 24GB LPDDR4X memory, bandwidth 204.8 GB/s.

Single card FP16 computing power comparison: 910C (320 TFLOPS) vs H100 (1410 TFLOPS), but the energy efficiency ratio is 1.8:1.

Cluster: 8 cards (entry-level), 32 cards (high-end).

Cluster scalability: With 32 cards, the computing power reaches 8960 TOPS (INT8) and the power consumption is only 8kW.

performance:

Inference: 671B model with 500 tokens per second and 6ms latency.

Training: 14.8 trillion tokens pre-trained, with an efficiency close to 90% of H100.

Configuration:

Supports domestic CPUs such as Kunpeng and Haiguang, with strong compatibility.

5. Price of Ascend DeepSeek All-in-One

Inference Machine:

32B version: 300,000-500,000 yuan.

Version 671B: 3-5 million yuan.

Training and pushing machine:

The starting price is 2 million yuan, and the high-end price exceeds 10 million yuan.

Cost-effectiveness: Compared with NVIDIA H100 solution (about 20 million yuan), the cost is reduced by 60%-70%.

API Pricing

V3 input 1 yuan/million tokens, R1 output 16 yuan/million tokens, much lower than OpenAI (60).

A free version is provided in the initial promotion to attract small and medium-sized enterprises to try it out.

6. Application scenarios of Ascend DeepSeek all-in-one machine

With its powerful computing power and flexible deployment capabilities, Ascend DeepSeek has penetrated into multiple industries, covering diverse needs from government to enterprise, from cloud to edge. The following is a detailed breakdown of four core scenarios, including application cases, technical details, and market prospects.

Government Affairs: Policy Analysis and Intelligent Question Answering

Ascend DeepSeek all-in-one machines are used in the government sector to process massive amounts of policy texts, public consultations, and data analysis, helping the government improve decision-making efficiency and service quality. For example, the intelligent question-and-answer system can answer citizens' questions in real time, and the policy analysis module can mine key points in regulations from multiple dimensions.

Examples:

The "Government Integrated Machine" jointly launched by Tuowei Information and Huawei has been deployed in many cities in Hunan. The system integrates the computing power of Ascend 910B and DeepSeek 70B model, supports real-time update and intelligent retrieval of provincial policy databases, and covers more than 50 million policy data.

Technical details:

Support for multimodal data processing: The all-in-one machine can parse text (such as policy PDFs) and images (such as handwritten application forms) at the same time. Through the joint reasoning of OCR+DeepSeek, the accuracy rate is increased from 85% to 98%.

High-concurrency reasoning: A single machine supports 100,000 question-and-answer requests per second, with a response time as low as 300ms.

benefit:

On a certain city’s government hotline, the system’s accuracy in identifying complex questions (such as “how to claim reimbursement under the new medical insurance policy”) increased by 15%, and the workload of manual customer service was reduced by 40%.

The predictive analysis function can deduce policy effects based on historical data, such as the impact of a certain tax adjustment on the income of small and medium-sized enterprises, with an error of only ±3%.

Outlook:

It is estimated that the national government AI market size will reach 80 billion yuan in 2025, and Ascend DeepSeek all-in-one machine is expected to occupy 20% of the market share.

Finance: Transaction Optimization and Risk Assessment

In the financial industry, Ascend DeepSeek is used to optimize high-frequency trading algorithms, real-time risk assessment, and intelligent customer service, providing low-latency, high-precision AI support. It can quickly process market data and generate decision-making recommendations, becoming the "computing brain" of financial institutions.

Examples:

iSoftStone's "full-stack financial solution" based on Ascend has served many leading securities firms and banks. For example, its transaction optimization module helped a securities firm improve the execution efficiency of intraday trading strategies by 25%.

Technical details:

Real-time reasoning: The DeepSeek 32B model reduces transaction latency from 50ms to 20ms on the Ascend 910C, and supports analysis of 100,000 transactions per second on a single machine.

Risk modeling: Through the multi-attention mechanism, historical data and real-time market conditions are analyzed to predict the default rate with an accuracy of 92%.

benefit:

In a certain bank's risk control scenario, the system's response time for identifying fraudulent transactions was shortened to 5ms, with annualized cost savings exceeding 120 million yuan.

The transaction optimization module can dynamically adjust parameters, helping brokers earn an additional 0.5%-1% in profit every day, equivalent to an annualized profit increase of several hundred million yuan.

Outlook:

The demand for financial AI computing power is expected to grow by 50% by 2025. The cost-effectiveness of Ascend DeepSeek all-in-one machine may help it seize 30% of the market from Nvidia.

Medical: disease diagnosis and drug screening

Ascend DeepSeek all-in-one machine helps accurate diagnosis and drug development in the medical field, processes medical images, genetic data and literature analysis, and helps doctors and researchers accelerate decision-making. It is particularly suitable for scenarios that require high computing power and localized deployment.

Examples:

The "Medical Training and Pushing Integrated Machine" developed by Hengwei Technology and Infervision Medical has been deployed in hundreds of hospitals. The system is based on the Ascend 910B and DeepSeek 70B models and supports lung nodule detection and drug target screening.

Technical details:

Image analysis: It takes only 2 seconds to process a CT image, and the sensitivity of detecting lung nodules reaches 97%, which is 5 percentage points higher than the traditional algorithm.

Drug screening: DeepSeek uses molecular dynamics simulation to increase the efficiency of screening candidate drugs by 3 times and can analyze more than 100,000 compounds per week.

benefit:

In a certain tertiary hospital, the system assisted in the diagnosis of early cases of lung cancer, reducing the misdiagnosis rate from 12% to 4%, saving hundreds of patients each year.

In drug research and development, the Ascend all-in-one machine shortened the target screening cycle of a certain anti-cancer drug from 6 months to 2 months, reducing the research and development cost by about 30%.

Outlook:

The medical AI market is expected to reach 150 billion yuan in 2027, and Ascend DeepSeek all-in-one machine may become a pioneer in domestic substitution.

Edge computing: video analysis and smart manufacturing

Ascend DeepSeek all-in-one machine shines in the field of edge computing, supporting real-time video analysis, industrial quality inspection and equipment predictive maintenance. Its small design and high energy efficiency make it suitable for scenarios such as factories and urban monitoring.

Examples:

A smart manufacturing company uses the Atlas 300I Pro inference card (with integrated DeepSeek 32B) to implement production line defect detection and equipment failure warning, and the shipment qualification rate has increased to 99.8%.

Technical details:

Video analysis: A single card supports 80 channels of 1080p video real-time decoding and target detection, with power consumption of only 150W.

Predictive maintenance: Through time series analysis, the accuracy of equipment failure prediction reaches 95%, and the inference delay is as low as 10ms.

benefit:

In a certain city security project, the system's speed in identifying suspicious behavior increased by 40%, the false alarm rate dropped to 2%, and more than 500 police officers were saved per year.

In industrial scenarios, quality inspection efficiency has increased from 500 pieces per hour manually to 50,000 pieces per hour, and labor costs have been reduced by 70%.

Outlook:

The edge AI market is expected to exceed 50 billion yuan in 2025, and Ascend all-in-one machines are expected to occupy 30% of the industrial and security fields.

Ascend DeepSeek all-in-one machine has demonstrated outstanding technical strength and application value in scenarios such as government affairs, finance, medical care, and edge computing. Whether it is improving government service efficiency (Tuowei Information case), optimizing financial transaction returns (Softcom case), accelerating medical diagnosis and R&D (Hengwei Technology case), or promoting edge intelligence (Atlas application), it meets diverse needs through high performance, low cost, and localization advantages. The successful implementation of these scenarios not only verifies the technical maturity of Ascend DeepSeek, but also lays the foundation for related industry ecosystems and A-share investment opportunities.

7. Customization of Ascend DeepSeek All-in-One

The customization capability of the Ascend DeepSeek all-in-one machine is a highlight. Whether it is the flexible adjustment of hardware configuration or the model optimization at the software level, it can accurately adapt to the needs of different industries and enterprises. This high degree of flexibility not only lowers the threshold for use, but also greatly improves deployment efficiency and cost-effectiveness. The following is an in-depth analysis from three aspects: hardware, software, and cases.

Hardware customization: flexible configuration to meet diverse needs

The hardware design of the Ascend DeepSeek all-in-one machine adopts a modular concept, and users can freely adjust the number of cards, storage capacity, and network bandwidth according to computing power requirements and budgets. This "building blocks" customization method allows it to serve both small businesses and ultra-large-scale intelligent computing centers.

detail:

Card number adjustment: from 8 cards in a single machine (entry-level, suitable for small and medium-sized enterprises) to 1024 cards in a cluster (high-end intelligent computing centers, such as national supercomputing projects), linear expansion is supported. The 8-card configuration provides 1120 TOPS (INT8) computing power, while the 1024-card configuration provides up to 143,000 TOPS.

Storage capacity: Starting from 1TB NVMe SSD, it can be expanded to 100TB to meet the needs from edge reasoning to big data training. For example, the financial industry can choose 10TB storage to support historical transaction analysis, while scientific research institutions can choose 100TB to process genomic data.

Network optimization: Supports RoCE network upgrade from 100GbE to 400GbE, and increases bandwidth from 200Gbps to 800Gbps, ensuring bottleneck-free multi-card cluster communication.

benefit:

In a certain industrial quality inspection scenario, the company chose a 16-card + 20TB storage configuration to process image data of 50,000 products per second, which is 40% cheaper than NVIDIA's similar solution.

For ultra-large-scale deployments (such as a provincial intelligent computing center), a 1024-card cluster combined with a 400GbE network can improve the communication efficiency of a 671B parameter model by 60%, reducing the training time from 3 months to 50 days.

Outlook:

Hardware customization allows customers to pay on demand, avoid wasting resources, and reduce the total cost of ownership (TCO) by about 30%-50%.

Software customization: model distillation and industry fine-tuning

Ascend DeepSeek all-in-one machine provides deep customization at the software level, including lightweight model distillation and industry-specific fine-tuning versions. This capability allows companies to quickly build dedicated AI tools based on existing frameworks instead of training large models from scratch.

detail:

Model distillation: Through the MindSpore framework, the DeepSeek 671B model is distilled into a 32B or 70B lightweight version, retaining 90% of the performance while significantly reducing computing power requirements. The distilled 32B model only requires 4 Ascend 910B cards to run, which is suitable for edge devices or customers with limited budgets.

Industry fine-tuning: Provides fine-tuning tool chains (such as MindSpore AutoTune) to support enterprises to upload their own data sets (such as financial transaction records and medical images) and quickly generate customized models. The fine-tuning process is fully automated, and the cycle is shortened from the traditional 3-6 months to 1 month.

benefit:

The distilled version of the 32B model reduces the inference cost by 50%, and the operating cost per million tokens is reduced from 16 yuan to 8 yuan, making it suitable for small and medium-sized enterprises to deploy intelligent customer service.

In the case of a logistics company, 100,000 pieces of transportation data were used to fine-tune the 70B model, optimizing route planning efficiency by 20% and saving more than 50 million yuan in fuel costs each year.

Software customization reduces the threshold for AI implementation by 80%, allowing companies to achieve private deployment without a professional AI team.

Customization case: China Telecom's "Xiran Intelligent Computing Machine"

China Telecom's customized "Xiran Intelligent Computing All-in-One Machine" based on Ascend DeepSeek All-in-One Machine is a typical success case. This product is optimized for 5G edge computing scenarios, integrating Ascend computing power and DeepSeek models, and supports low-latency reasoning and real-time data processing.

detail:

Hardware configuration: It uses 8-card Ascend 910C + 5TB storage, and the power consumption of a single machine is controlled within 2kW, which is suitable for edge computer rooms.

Software adaptation: Equipped with the distilled version of DeepSeek 32B model, it is fine-tuned for 5G network optimization and user behavior analysis, and supports 100,000 network request processing per second.

Application scenario: Deployed at the edge nodes of 5G base stations, it can analyze user traffic patterns in real time, dynamically adjust bandwidth allocation, and improve network utilization by 15%.

benefit:

In a pilot project in a certain city, the "Xiran Intelligent Computing All-in-One Machine" reduced the video stream analysis delay from 200ms to 50ms, supported real-time target detection in 4K monitoring, and reduced the false alarm rate to 1%.

The fine-tuned model can also predict network congestion and adjust resources one hour in advance, reducing user complaints by 300,000 times per year.

Outlook:

China Telecom plans to deploy 5,000 "Xiran Intelligent Computing All-in-One Machines" across the country by 2025, with an estimated additional revenue of more than 2 billion yuan, and Ascend ecosystem partners (such as Tuowei Information) will get a share of it.

8. Upstream and downstream industry ecosystem of Ascend DeepSeek all-in-one machine

The success of the Ascend DeepSeek all-in-one machine is inseparable from the huge industrial ecosystem support behind it. From upstream chip manufacturing and storage supply, to midstream hardware integration, and then to downstream cloud services and software optimization, Huawei has built a domestic AI computing power ecosystem covering the entire industry chain through collaboration with many partners. The following is a detailed analysis from the upstream, midstream, and downstream levels.

Upstream: Core hardware supply chain

The upstream industry provides key components such as chips and storage for the Ascend DeepSeek all-in-one machine, and is the cornerstone of the entire ecosystem. Driven by policies and domestic substitution, upstream companies are accelerating technological breakthroughs and capacity expansion.

Chip manufacturing: Semiconductor Manufacturing International Corporation (SMIC)

SMIC is the main foundry for Ascend 910B and 910C chips, using 7nm and N+2 processes to provide Huawei with high-performance AI chips.

Ascend 910C is expected to be mass-produced in Q1 2025, with an annual shipment target of 1 million units. Compared with 910B, the yield rate of 910C has been increased from 20% to 40%, and is planned to be further optimized to 60% by the end of 2025, close to the international advanced level (such as TSMC's 65% yield rate of 5nm).

SMIC's new 12-inch wafer fab in Pudong, Shanghai has been put into operation, with a production line dedicated to Ascend series chips, which can produce about 20,000 wafers per month (each wafer cuts about 500 chips). Restricted by US sanctions, its equipment relies on the second-hand market, but through process optimization (such as multiple exposure technology), the performance has approached 70% of Nvidia's A100.

Other players:  Hua Hong Semiconductor: provides some auxiliary chips (such as power management ICs) to Ascend, and plans to expand production by 20% in 2025.

Shanghai Microelectronics: Supplying spare parts for lithography machines and supporting equipment maintenance for SMIC.

Storage: Yangtze Memory Technologies (YMTC)

Yangtze Memory provides high-performance NVMe SSDs to meet the needs of all-in-one machines for large-capacity, low-latency storage.

Its latest 128-layer 3D NAND flash memory chip has been put into mass production, with a single-disk capacity of 16TB and read and write speeds of 3.5GB/s and 3GB/s respectively, comparable to Samsung's enterprise-level SSDs.

Yangtze Memory has customized a low-power SSD solution for the Ascend all-in-one machine, which reduces power consumption by 15% compared to competing products. In a test conducted by a financial customer, the transaction data processing speed increased by 20%. In 2025, its Wuhan factory plans to add 100,000 pieces of production capacity per month to prioritize the needs of the Ascend ecosystem.

Other players:  GigaDevice: provides NOR Flash and DRAM to meet the storage needs of edge inference devices.

Zidong Microelectronics: Developing domestic HBM3 memory, it plans to integrate it with Ascend 910C in 2026 to improve cluster training efficiency.

Network equipment: Huawei self-developed + partners

Huawei's self-developed RoCE switches (such as the CloudEngine series) provide 200Gbps-800Gbps high-bandwidth networks.

Combined with the 400G optical module of Shengxun Technology, the data throughput reaches 500TB per second, meeting the needs of ultra-large-scale clusters.

In a test at a certain intelligent computing center, Shengxun's optical module reduced network latency from 10μs to 5μs and increased communication efficiency by 50%.

Midstream: Hardware integration and system optimization

Midstream companies are responsible for integrating Ascend chips and storage into all-in-one products, providing diversified hardware solutions covering servers, edge devices and intelligent computing centers.

Integrator: PowerLeader

The company launched its own brand "Ascend Training and Pushing Machine", which is mainly aimed at the small and medium-sized enterprise market. Its PR210A model is equipped with 8-card Ascend 910B, supports DeepSeek 70B model training, and the price of a single machine is about 1.5 million yuan.

In 2024, Baode delivered 50 all-in-one machines to a manufacturing customer. After optimizing the quality inspection process, the product defective rate dropped from 5% to 1%, saving more than 30 million yuan in annual costs.

As a core distributor of the Ascend ecosystem, Digital China  launched the "Shenzhou Kuntai" series of servers. Its R620 model integrates 16 Ascend 910C cards, with a computing power of 4480 TOPS, targeting financial and government scenarios.

Digital China plans to ship 100,000 units by 2025, covering more than 200 cities across the country, and has signed a contract with a provincial government to deploy 500 units to support smart city projects.

Other players:  Huakun Zhenyu: Focuses on edge computing all-in-one machines, with shipments reaching 20,000 units in 2024.

Sugon Information: Launched the "Silicon Cube" all-in-one supercomputer, which is deeply integrated with Ascend.

Server manufacturing: Inspur

The "Hairuo All-in-One" was launched, supporting the full range of DeepSeek models. Its NF5280M6 model is equipped with 32 Ascend 910C cards, with a computing power of 8960 TOPS and a power consumption of only 8kW.

Inspur delivered 100 Haier all-in-one machines to a scientific research institution, which increased the speed of training climate models by 30% and reduced energy consumption by 25%.

Downstream: Cloud services and software ecosystem

Downstream companies use cloud services and software optimization to transform the computing power of the Ascend DeepSeek all-in-one machine into practical applications to serve enterprises and developers.

Cloud Services:

JD Cloud  integrates Ascend computing power to provide cloud-based AI services.

A 5,000-card Ascend 910B cluster has been deployed, serving over 100,000 corporate customers, covering e-commerce, logistics and other scenarios.

JD Cloud used Ascend clusters to optimize recommendation algorithms during the 2024 "618" event, increasing order conversion rates by 18% and reducing inference costs by 40%. It plans to expand capacity to 10,000 cards in 2025.

Tencent Cloud  has integrated Ascend computing power into its cloud platform to support gaming and AI reasoning.

Deploy 3,000 Ascend 910C cards to support 500,000 inference requests per second.

Tencent Cloud optimized the NPC behavior model for a game company, increasing player retention rate by 10% and cloud service revenue by 500 million yuan.

Other players:  China Telecom: launched "Xiran Intelligent Computing Cloud", with the goal of covering 5,000 5G base stations across the country by 2025.

Alibaba Cloud: Plans to integrate Ascend 910C in Q2 2025 to provide hybrid cloud solutions.

Software Ecosystem: LuChen Technology

Optimize DeepSeek inference engine to improve model efficiency. Its self-developed "Xuanwu" engine increases the inference speed of the 671B model by 20%, reaching 600 tokens per second.

Luchen optimized the question-and-answer system for an educational platform, reducing the response time from 1 second to 0.5 seconds and increasing user satisfaction by 30%.

Other players:  QingMao: Developed a compiler specifically for Ascend, reducing developer migration costs by 50%.

KUNLUNXIN: Provides Ascend + DeepSeek scheduling software, increasing cluster utilization by 25%.

The industrial ecosystem of Ascend DeepSeek all-in-one covers upstream chips (SMIC, Yangtze Memory), midstream integration (PowerLeader, Digital China, Inspur Information) and downstream cloud services (JD Cloud, Tencent Cloud) and software optimization (Luchen Technology), forming a collaborative and efficient domestic AI computing network. SMIC's million-piece production capacity target, JD Cloud's 10,000-card cluster plan, and Luchen's inference engine optimization demonstrate the vitality and potential of the ecosystem. This system not only supports the widespread application of Ascend all-in-one machines, but also injects strong impetus into the autonomous development of China's AI industry.

Disclaimer

This article is written based on public information, technical white papers and industry research. It is for reference only and does not constitute any investment advice. Readers should think independently, make prudent decisions, and bear their own investment risks. This public account is not responsible for any losses caused by the use of this article.