Recommendation
Explore the compatibility challenges of Qwen3 and Ollama models, and understand the support status and deployment suggestions of the latest versions.
Core content:
1. Compatibility requirements of the new version of Ollama and Gemma3
2. Recommended operating environment and support of different Linux distributions for the new model
3. Performance of Qwen3 model under different Ollama versions and deployment suggestions
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
After the new model is released, it needs to be adapted to Ollama. For example, Gemma3 is supported only after version v0.6.0, and previous versions cannot run Gemma3. At the same time, the new version of ollama has requirements for the gblic version, and the new version of ollama cannot be run on versions such as CentOS 7.The recommended operating environment is a new Linux distribution such as Ubuntu 22.04 or Ubuntu 24.04. They come with a higher version of glibc and better support for new models and Nvidia drivers. Yesterday I tested running qwen3:32b on Ollama v0.6.0. Although I could download the model file, I couldn't run it. When running qwen3:32b, I got an error:Error: unable to load model
The latest version of Ollama is v0.6.6, and the rc version is v0.6.7-rc0When a new version is released, the supported new models will be announced, for example, v0.6.0 announced support for gemma3, and v0.6.6 announced support for DeepCoder.Currently, the v0.6.7-rc version has not yet announced support for qwen3. Through testing, it was found that qwen3:32b can be run under v0.6.7, but there are stability issues. The qwen3 model will be automatically unloaded at the end of each session, and the model will be reloaded at the beginning of a new session. On the application side, the response will be slow, and each question will be stuck for a while.It is recommended to wait until the new ollama version officially announces support for qwen3 before deploying. The current stability is insufficient to use qwen3 normally.The file sizes of the qwen3 models of various sizes are:For local deployment, you can consider qwen3:30b-a3b and qwen3:32b. They are similar in size but different in model. qwen3:32b is a dense model and qwen3:32b-a3b is a mixed expert (Moe) model. The most powerful model of qwen3 is qwen3:235b-a22b with 142G, which is a Moe model like qwen3:32b-a3b. If it is a 4-card v100 or 3060 graphics card, you can consider deploying qwen3:32b and qwen3:30b-a3b. If it is an 8-card A800, you can consider qwen3:235b-a22b.Here is the A800 machine that I just installed yesterday: