ollama v0.6.6 is released! The reasoning ability is doubled, and the download speed is increased by 50%. Which one is stronger than vLLM/LMDeploy?

Ollama v0.6.6 is coming, the reasoning capability is doubled, the download speed is increased by 50%, a new choice for AI developers!
Core content:
1. Introducing two new models, Granite 3.3 and DeepCoder, to enhance reasoning and code generation capabilities
2. Download speed is significantly improved, memory leaks are fixed, and the operation is more stable
3. API and compatibility improvements, ease of use, reasoning speed and memory optimization are comprehensively improved
Ollama v0.6.6 major update: stronger reasoning, faster download, more stable memory
Attention AI developers! Ollama v0.6.6 is officially released, bringing many major optimizations, including new model support, faster download speed, memory leak fixes, etc., making local large model inference more efficient and stable!
? Core update highlights
1. Two new models are launched
• Granite 3.3 (2B & 8B): 128K ultra-long context , optimized instruction following and logical reasoning capabilities, suitable for complex task processing. • DeepCoder (14B & 1.5B): A completely open source code model with performance comparable to O3-mini. Developers can deploy high-quality code generation AI at low cost!
2. Download speed is greatly improved
• Experimental new downloader : via OLLAMA_EXPERIMENT=client2 ollama serve
Enable, download faster and more stable!• Safetensors import optimization : ollama create
Significant performance improvements when importing models.
3. Critical BUG fixes
• Gemma 3 / Mistral Small 3.1 memory leak issue fixed , running more stable. • OOM (out of memory) issue optimization , reserve more memory at startup to avoid crashes. • Safetensors import data corruption issue fixed to ensure model integrity.
4. API and compatibility improvements
• Supports tool function parameter type arrays (such as string | number[]
), the API is more flexible.• OpenAI-Beta CORS header support for easy front-end integration.
? Ollama vs. vLLM vs. LMDeploy: Who is the king of local deployment?
Comparison Dimensions | Ollama v0.6.6 | vLLM | LMDeploy |
Ease of use | |||
Inference speed | |||
Memory optimization | |||
Model support | |||
Applicable scenarios | Personal development/lightweight application | Highly concurrent production environment | Enterprise-level real-time conversation/edge computing |