Qwen3 series models released, deep thinking, fast response

Explore the qwen3 series models and experience the powerful capabilities of multi-language and hybrid thinking modes.
Core content:
1. Overview of the qwen3 series model architecture, including intensive and hybrid expert models
2. Hybrid thinking mode, flexible control of model thinking level
3. The flagship model Qwen3-235B-A22B's outstanding performance in multiple benchmarks
qwen3
Overview
1. Divided into dense model architecture (0.6B/1.7B/4B/8B/14B/32B) and hybrid expert architecture (30B-A3B/235B-A22B) 2. Hybrid thinking mode: supports turning on/off reasoning ability Thinking Patterns
andNon-thinking mode
, enabling users to control the model according to specific tasksDegree of thinking
3. Multilingualism: 119 languages and dialects 4. Enhance Agent capabilities: Optimize the Agent and code capabilities of the Qwen3 model, and strengthen support for MCP
Architecture
Dense Model
MoE Model
Benchmarks
According to the official benchmark test,
Flagship Model
Qwen3-235B-A22B
existCode
,math
,general ability
In benchmarks such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro, it shows very competitive results. In addition, the small MoE modelQwen3-30B-A3B
The number of activation parameters is 10% of that of QwQ-32B, which is superior, and even a small model like Qwen3-4B can match the performance of Qwen2.5-72B-Instruct.
train
Qwen3 uses about 36 trillion tokens, and the pre-training phase is divided into three steps:
1. The model is pre-trained on more than 30 trillion tokens with a context length of 4K tokens. This phase provides the model with basic language skills and general knowledge 2. The dataset was improved by increasing the proportion of knowledge-intensive data (such as STEM, programming, and reasoning tasks), and the model was subsequently pre-trained on an additional 5 trillion tokens 3. Extend the context length to 32K tokens using high-quality long context data
The post-training is divided into 4 steps
1. Long-term thinking chain cold start: The model was fine-tuned using a variety of long-term thinking chain data, covering a variety of tasks and fields such as mathematics, coding, logical reasoning, and STEM problems. This process aims to equip the model with basic reasoning capabilities 2. Long thought chain reinforcement learning: the focus is on Large-Scale Reinforcement Learning
, using rule-based rewards to enhance the model's exploration and exploration capabilities3. Thinking mode integration: integrating non-thinking modes into thinking models 4. General Reinforcement Learning: Reinforcement learning is applied to more than 20 general domain tasks to further enhance the general capabilities of the model and correct bad behaviors
Using qwen3
Upgrading Ollama
The qwen3 model requires ollama v0.6.6 or higher . First upgrade ollama on Linux to v0.6.6:
wget https://github.com/ollama/ollama/releases/download/v0.6.6/ollama-linux-amd64.tgz
sudo systemctl stop ollama
sudo rm -rf /usr/lib/ollama
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
sudo systemctl start ollama
After the upgrade, download qwen3:8b
Model, size is 5.2G
$ ollama pull qwen3:8b
pulling manifest
pulling a3de86cd1c13: 100% ▕ █████████████████████████████████████████████████████████████████████████████████████████ ▏ 5.2 GB
pulling eb4402837c78: 100% ▗
pulling d18a5cc71b84: 100% ▗
pulling cff3f395ef37: 100% ▗
pulling 05a61d37b084: 100% ▗
verifying sha256 digest
writing manifest
success
After configuring the model, use it in LobeChat qwen3:8b
See the actual effect:
In terms of content classification, DeepSeek-R1:14B and qwen3:8b are on par with each other.
In terms of content classification, DeepSeek-R1:14B beats qwen3:8b.
In general, each has its own advantages, and you should choose a model based on the actual results.