Woter AI detection.Hurry - ends Jul 11th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Qwen3 small model actual test: from 4B to 30B, which one can use MCP to communicate smoothly with Obsidian?

Written by

Clara Bennett

Updated on:June-29th-2025

This paper tests the interaction effect between the Qwen3 series local models (4B/8B/14B) and the knowledge base of Obsidian-MCP, and finds that the small model has problems such as tool call failure, response hallucination and context limitation.Version 4B Loss of instruction comprehension due to quantization,Version 8BAlthough the tool can be called, there are content deviations.14B+We can have a normal conversation. The availability of local mini-models is gradually increasing, but I am still a 16G graphics card away from smooth interaction?

Qwen3 small model actual test: from 4B to 30B, which one can use MCP to communicate smoothly with Obsidian?

I heard it was released last night qwen3 The model's Agent and code capabilities have been optimized to strengthen support for MCP.

Qwen3: Think deeply and move quickly
https://qwenlm.github.io/zh/blog/qwen3/

This sentence in the introduction

The small MoE model Qwen3-30B-A3B has 10% of the number of activated parameters of QwQ-32B, and performs better. 
`A small model like Qwen3-4B can also match the performance of Qwen2.5-72B-Instruct`.

I was very excited, so I went home after get off work. nas server use Ollama Pull model deployed, use cherry studio, enable obsidian-mcp, started testing, but the test results slapped me in the face.

Test content:

1. Check my obsidian knowledge base for changes in the last day, and the model will answer randomly

The model cannot hit the tool.

1. Use obsidian's mcp obsidian_get_recent_changes tool to query the changes in my knowledge base in the last day

I prompted the name of the tool, but the model still gave a random answer.

qwen3 model

Model evaluation item description

Evaluation Name	illustrate	Key points
ArenaHard	Human comparative evaluation of comprehensive conversational ability, focusing on "difficult scenarios"	A high score means the dialogue is natural and logical.
AIME'24/'25	Mathematical competition questions to test mathematical reasoning, number series, geometry, etc.	GPT-4o scores low because it does not turn on "thinking mode" in this benchmark, Qwen3 performs more realistically
LiveCodeBench	Code generation tasks, combined with real-time code execution to verify correctness	Qwen3-4B performs close to GPT-4o, indicating that the small model has strong coding capabilities
CodeForces (Elo Rating)	Similar to the Elo ranking in programming competitions, the higher the stronger	Qwen3-4B > GPT-4o, which means it is better than GPT-4o in "problem solving speed + accuracy"
GPQA	High-quality question-answering set (similar to academic QA) to examine multi-hop reasoning	The Qwen series maintains its advantage, showing that it takes both knowledge and reasoning into account
LiveBench	Real-time dialogue task evaluation, including multi-turn context and factual requirements	GPT-4o has a low score (52.2), which means it may not be the best in all tasks.
BFCL	Instruction following and dialogue coherence test, Qwen uses FC format for assessment	GPT-4o performs the best, Qwen3-4B is slightly weaker but close
MultiIF (8 Languages)	Multilingual instruction following ability assessment	Qwen3-4B has better multilingual generalization and is better than GPT-4o (especially in non-English scenarios)

Obsidian-MCP

Obsidian-MCP is commonly used for the following tasks:

• Semantic retrieval and summarization of log/note content (embedding + question-answering)
• Self-dialogue (multi-turn historical context)
• Context-based "thinking enhancement" such as task suggestions and card associations
• Memory callback of private knowledge base (streamable / SSE mode long connection)
• Local embedding + lightweight reasoning, no reliance on public network LLM

These tasks mainly require:
•  Ability to follow instructions
•  Context-aware (little context)
•  Moderate reasoning skills
•  Fast response, small model, easy to deploy

Obsidian API Tools List

JSON search to obtain the content of periodic notes. Get the list of recent periodic notes. Get the most recently modified files.

Tools and Methods	Functional Description	parameter
list_files_in_vault	Get a list of knowledge base files	none
list_files_in_dir	Get the file list of the specified directory	dirpath
get_file_contents	Get the contents of a single file	filepath
get_batch_file_contents	Get the contents of multiple files in batches	filepaths
search	Perform a simple search	query, context_length
search_json	Performing complex format searches	query
append_content	Append content to a file	filepath, content
patch_content	Modify the specified content block of the file	filepath, operation, target_type, target, content
delete_file	Deleting files/directories	filepath
get_periodic_note	Get cycle notes content	period
get_recent_periodic_notes	Get the list of recent cycle notes	period, limit, include_content
get_recent_changes	Get recently modified files	limit, days

Test whether Qwen3-4B's capabilities match the above requirements

qwen3:4b, the words are spoken very quickly, and the level of the answer is also high, but the text is not relevant to the topic, and it doesn't even recognize that the tool needs to be called.
So I looked at the hugging_face tokenizer_config.json model configuration, and indeed there istool_callWhy is this layer not working? Is it thisq4 quantificationCauses severe IQ loss?

I thought the 4B small n card on my NAS finally came in handy, but it seems I have to wait a little longer.
I want to try 8B again, but the local video memory is not enough, so I changed it to openrouter Service tests 8b, 14b, 30b.

Test whether Qwen3-8B's capabilities match the above requirements

Use cherryStudio to test qwen3:8b. It is possible to call the tool, but the answer is hallucinatory and the name of the returned note is changed.

Qwen3-4B-Local Model + Obsidian-MCP's `Local Q& A`.md
The answer became

01Project / Blog/draft/Qwen3-4B-Local Model + Obsidian-MCP's `Local Issues`.md

At this time, the notes are used git sync The advantage comes out. When you use mcp locally to organize your notes, if an error occurs, you can roll back to the last submitted version at any time!

This 8B can basically only be used for chatting, and in my scenario it is just for show but not for use

Test whether Qwen3-14B's capabilities match the above requirements

Using openrouter qwen3:14b Model testing

It looks good and can return results normally.
But when I want to test the content in depth, it reports insufficient tokens. According to official data,qwen3:14bThe maximum token of the model is128K, 150,000 words, I think this is enough to analyze a note.
But when I tested it, I asked it to read the note content and summarize it, but it prompted that the token exceeded 40k. I don’t know why?

It is clear from this error message that the
current context limit of the model is : 40960 tokens ➤ Exceeded.

I think it is a limitation of openRouter's own deployment. qwen3-demo:

https://huggingface.co/spaces/Qwen/Qwen3-Demo

After testing, the same text can be summarized normally, and 128k tokens are enough. It seems that 8B, 14B, and 32B can still be used locally.

in conclusion

The knowledge base interaction test using Qwen3 and Obsidian-MCP concluded that:

Version 4B : Quantization compression leads to aphasia

• The tool call capability is completely lost, facing clear obsidian_get_recent_changes Indifferent to instructions
• The token capacity is 32K, so long sessions may be difficult to process completely

Version 8B : Seemingly useful but actually dangerous

• Although the tool call can be recognized, the returned file path has a high error rate;
• Appears when the content is summarized Hallucination Rewrite, the note name will be modified;
• If the MCP API is accidentally deleted and there is no git backup, it will be more dangerous

Version 14B+ : Really Fragrance Warning

• 128K token capacity perfectly adapts to the knowledge base scenario, and accurately calls the Obsidian API during testing
• However, local deployment requires 16G video memory, which is prohibitive for most NAS users

Before my 16G graphics card arrives, I have to pay attention to privacy protection. I first use the cloud-based large model + MCP to read the non-sensitive data directory as the context for questions and answers.

After all, to be a technology master, you must understand Finding the optimal solution within realistic constraints.