Qwen3 small model actual test: from 4B to 30B, which one can use MCP to communicate smoothly with Obsidian?

The interaction effect between Qwen3 series small models and Obsidian-MCP is measured to reveal the performance differences of models of different scales.
Core content:
1. The test results of the interaction between Qwen3 series models (4B/8B/14B) and Obsidian-MCP
2. The performance of each model in terms of tool calling, content deviation, contextual restrictions, etc.
3. The performance improvement trend of Qwen3 small models and the hardware threshold for smooth interaction
Version 4B
Loss of instruction comprehension due to quantization,Version 8B
Although the tool can be called, there are content deviations.14B+
We can have a normal conversation. The availability of local mini-models is gradually increasing, but I am still a 16G graphics card away from smooth interaction?Qwen3 small model actual test: from 4B to 30B, which one can use MCP to communicate smoothly with Obsidian?
I heard it was released last night qwen3
The model's Agent and code capabilities have been optimized to strengthen support for MCP.
Qwen3: Think deeply and move quickly
https://qwenlm.github.io/zh/blog/qwen3/
This sentence in the introduction
The small MoE model Qwen3-30B-A3B has 10% of the number of activated parameters of QwQ-32B, and performs better.
`A small model like Qwen3-4B can also match the performance of Qwen2.5-72B-Instruct`.
I was very excited, so I went home after get off work. nas server
use Ollama
Pull model deployed, use cherry studio
, enable obsidian-mcp
, started testing, but the test results slapped me in the face.
Test content:
1. Check my obsidian knowledge base for changes in the last day, and the model will answer randomly
The model cannot hit the tool.
1. Use obsidian's mcp obsidian_get_recent_changes tool to query the changes in my knowledge base in the last day
I prompted the name of the tool, but the model still gave a random answer.
qwen3 model
Model evaluation item description
ArenaHard | ||
AIME'24/'25 | ||
LiveCodeBench | ||
CodeForces (Elo Rating) | ||
GPQA | ||
LiveBench | ||
BFCL | ||
MultiIF (8 Languages) |
Obsidian-MCP
Obsidian-MCP is commonly used for the following tasks:
• Semantic retrieval and summarization of log/note content (embedding + question-answering) • Self-dialogue (multi-turn historical context) • Context-based "thinking enhancement" such as task suggestions and card associations • Memory callback of private knowledge base (streamable / SSE mode long connection) • Local embedding + lightweight reasoning, no reliance on public network LLM
These tasks mainly require:
• Ability to follow instructions • Context-aware (little context) • Moderate reasoning skills • Fast response, small model, easy to deploy
Obsidian API Tools List
JSON search to obtain the content of periodic notes. Get the list of recent periodic notes. Get the most recently modified files.
Test whether Qwen3-4B's capabilities match the above requirements
qwen3:4b, the words are spoken very quickly, and the level of the answer is also high, but the text is not relevant to the topic, and it doesn't even recognize that the tool needs to be called.So I looked at the hugging_face tokenizer_config.json model configuration, and indeed there is
tool_call
Why is this layer not working? Is it thisq4 quantification
Causes severe IQ loss?I want to try 8B again, but the local video memory is not enough, so I changed it to
openrouter
Service tests 8b, 14b, 30b.Test whether Qwen3-8B's capabilities match the above requirements
Use cherryStudio to test qwen3:8b. It is possible to call the tool, but the answer is hallucinatory and the name of the returned note is changed.
Qwen3-4B-Local Model + Obsidian-MCP's `Local Q& A`.md
The answer became
01Project / Blog/draft/Qwen3-4B-Local Model + Obsidian-MCP's `Local Issues`.md
At this time, the notes are used
git sync
The advantage comes out. When you use mcp locally to organize your notes, if an error occurs, you can roll back to the last submitted version at any time!
This 8B can basically only be used for chatting, and in my scenario it is just for show but not for use
Test whether Qwen3-14B's capabilities match the above requirements
Using openrouter qwen3:14b
Model testing
It looks good and can return results normally.
But when I want to test the content in depth, it reports insufficient tokens. According to official data,
qwen3:14b
The maximum token of the model is128K
, 150,000 words, I think this is enough to analyze a note. But when I tested it, I asked it to read the note content and summarize it, but it prompted that the token exceeded 40k. I don’t know why?
It is clear from this error message that the
current context limit of the model is : 40960 tokens ➤ Exceeded.
I think it is a limitation of openRouter's own deployment. qwen3-demo
:
https://huggingface.co/spaces/Qwen/Qwen3-Demo
After testing, the same text can be summarized normally, and 128k tokens are enough. It seems that 8B, 14B, and 32B can still be used locally.
in conclusion
The knowledge base interaction test using Qwen3 and Obsidian-MCP concluded that:
Version 4B : Quantization compression leads to aphasia
• The tool call capability is completely lost, facing clear obsidian_get_recent_changes
Indifferent to instructions• The token capacity is 32K, so long sessions may be difficult to process completely
Version 8B : Seemingly useful but actually dangerous
• Although the tool call can be recognized, the returned file path has a high error rate; • Appears when the content is summarized Hallucination Rewrite
, the note name will be modified;• If the MCP API is accidentally deleted and there is no git backup, it will be more dangerous
Version 14B+ : Really Fragrance Warning
• 128K token capacity perfectly adapts to the knowledge base scenario, and accurately calls the Obsidian API during testing • However, local deployment requires 16G video memory, which is prohibitive for most NAS users
Before my 16G graphics card arrives, I have to pay attention to privacy protection. I first use the cloud-based large model + MCP to read the non-sensitive data directory as the context for questions and answers.
After all, to be a technology master, you must understand Finding the optimal solution within realistic constraints
.