Alibaba open-sources QwQ-32B inference model! 32.5B vs 671B | Only 1/10 the cost

Written by

Audrey Miles

Updated on:July-13th-2025

Alibaba's latest open-source QwQ-32B inference model challenges the intelligent boundary of models with hundreds of billions of parameters with its compact architecture of only 32.5B parameters. The model is deeply optimized based on the Qwen2.5 architecture and adopts the three-stage training paradigm of "pre-training-supervised fine-tuning-reinforcement learning", achieving a major breakthrough in parameter efficiency. Although the number of parameters is only 1/20 of the total number of parameters of DeepSeek R1 ( 32.5B vs 671B ), and even lower than its 37B activation parameters, it has the potential to compete with top closed-source models in specific areas, or approach the intelligence level of DeepSeek R1.

According to the official test results, QwQ-32B has achieved leading advantages in mathematics, programming, and general capabilities, and performed very well in many key evaluations:

Core architecture innovation‍‍

Infrastructure Design

Revolution in parameter efficiency: Through dynamic sparse activation technology, the activation parameters in actual reasoning only account for 52% of the theoretical value, which is 3 times more efficient than traditional dense models.

Mixed precision training: Using the BF16 precision training framework, while maintaining numerical stability, the memory usage is reduced by 40% compared to FP32 training

Storage optimization design: The native model size is only 65GB, which is 90% smaller than the 671GB of DeepSeek R1, making it more suitable for edge computing deployment.

Hardware Adaptation Features

FP8 compatible architecture: Although the native version uses BF16, FP8 inference can be achieved through dynamic quantization technology. On hardware that supports FP8, such as NVIDIA H100, the amount of computation per forward propagation is reduced by 28% compared to DeepSeek R1

Dynamic memory management: Developed an adaptive cache compression algorithm that reduces video memory usage by 37% compared to the baseline model when processing 131k tokens long context

Quick Review: Advantages and Challenges‍‍

GPQA Diamond Review

QwQ-32B scored 59.5% , significantly behind DeepSeek R1 (71%) and slightly behind Gemini 2.0 Flash (62%).

AIME 202 4 Review

QwQ-32B surpassed DeepSeek R1 with a score of 78% , second only to the o3-mini-high model, which was far ahead.

Interpretation : There is still a gap in complex academic reasoning (GPQA), but it performs well in medical diagnostic reasoning (AIME), which verifies the potential of Alibaba's "lightweight and efficient" technology route.

QwQ-32B: Reinforcement Learning

Large-scale reinforcement learning (RL) has the potential to improve model performance beyond traditional pre-training and post-training methods. Recent studies have shown that reinforcement learning can significantly improve the reasoning ability of models. For example, DeepSeek R1 achieves state-of-the-art performance by integrating cold start data and multi-stage training, enabling it to perform deep thinking and complex reasoning.

The team explored how large-scale reinforcement learning (RL) can improve the intelligence of large language models, and is excited to launch our latest inference model, QwQ-32B. This is a model with 32 billion parameters, and its performance is comparable to DeepSeek-R1 with 671 billion parameters (37 billion of which are activated). This achievement highlights the effectiveness of applying reinforcement learning to powerful base models that have been pre-trained at a large scale. The team also integrated agent-related capabilities into the inference model, enabling it to think critically while using tools and adjust the reasoning process based on environmental feedback. The team hopes to prove that a powerful base model superimposed with large-scale reinforcement learning may be a viable path to general artificial intelligence.

Conclusion‍‍‍‍‍

The birth of QwQ-32B marks a qualitative change in the "deep thinking" of language models. As an open source pioneer, it leads the new global AI landscape. Currently, after QwQ-32B is open source, it has ranked first in the HuggingFace global model trend.

The open source of QwQ-32B marks an important breakthrough for the Chinese AI community in the field of efficient reasoning models. Although there is still a gap in professional benchmarks such as GPQA, its performance in clinical reasoning tasks such as AIME has shown great application potential. With Alibaba's continuous iterative optimization (the number of stars in the GitHub repository has exceeded 15k), the model is redefining the performance boundaries of medium-scale language models.

The technical team revealed that the next-generation QwQ-64B model will adopt an innovative "liquid neural network" architecture, aiming to achieve 90% of the reasoning capabilities of DeepSeek R1 while maintaining a 32B level of parameters.