ollama v0.6.6 is released! The reasoning ability is doubled, and the download speed is increased by 50%. Which one is stronger than vLLM/LMDeploy?

Written by
Iris Vance
Updated on:June-30th-2025
Recommendation

Ollama v0.6.6 is coming, the reasoning capability is doubled, the download speed is increased by 50%, a new choice for AI developers!

Core content:
1. Introducing two new models, Granite 3.3 and DeepCoder, to enhance reasoning and code generation capabilities
2. Download speed is significantly improved, memory leaks are fixed, and the operation is more stable
3. API and compatibility improvements, ease of use, reasoning speed and memory optimization are comprehensively improved

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

 

Ollama v0.6.6 major update: stronger reasoning, faster download, more stable memory

Attention AI developers! Ollama v0.6.6  is officially released, bringing many major optimizations, including new model support, faster download speed, memory leak fixes, etc., making local large model inference more efficient and stable!

? Core update highlights

1. Two new models are launched

  • •  Granite 3.3 (2B & 8B): 128K ultra-long context , optimized instruction following and logical reasoning capabilities, suitable for complex task processing.
  • •  DeepCoder (14B & 1.5B): A completely open source code model with performance comparable to O3-mini. Developers can deploy high-quality code generation AI at low cost!

2. Download speed is greatly improved

  • •  Experimental new downloader : via OLLAMA_EXPERIMENT=client2 ollama serve Enable, download faster and more stable!
  • •  Safetensors import optimization :ollama create Significant performance improvements when importing models.

3. Critical BUG fixes

  • •  Gemma 3 / Mistral Small 3.1 memory leak issue fixed , running more stable.
  • •  OOM (out of memory) issue optimization , reserve more memory at startup to avoid crashes.
  • •  Safetensors import data corruption issue fixed to ensure model integrity.

4. API and compatibility improvements

  • •  Supports tool function parameter type arrays (such as string | number[]), the API is more flexible.
  • •  OpenAI-Beta CORS header support for easy front-end integration.

? Ollama vs. vLLM vs. LMDeploy: Who is the king of local deployment?

Comparison DimensionsOllama v0.6.6vLLMLMDeploy
Ease of use
⭐⭐⭐⭐⭐ (One-click installation, suitable for individual developers)
⭐⭐⭐ (Docker/complex configuration required)
⭐⭐⭐⭐ (Zero One Everything Optimized, Suitable for Enterprises)
Inference speed
⭐⭐⭐ (suitable for small and medium models)
⭐⭐⭐⭐⭐ (PagedAttention optimization, high throughput)
⭐⭐⭐⭐ (Turbomind engine, low latency)
Memory optimization
⭐⭐⭐ (Automatic CPU/GPU switching)
⭐⭐⭐⭐⭐ (continuous batch processing, high memory utilization)
⭐⭐⭐⭐ (W4A16 quantization, save video memory)
Model support
⭐⭐⭐⭐ (Supports GGUF quantization, rich community)
⭐⭐⭐ (model format needs to be converted manually)
⭐⭐⭐ (mainly adapted to the InternLM ecosystem)
Applicable scenariosPersonal development/lightweight applicationHighly concurrent production environmentEnterprise-level real-time conversation/edge computing