Facts have proved that the Qianwen Qwen3 small model is the productivity of the enterprise. What can it do?

Written by
Audrey Miles
Updated on:June-22nd-2025
Recommendation

Alibaba released the Qwen3 series of models. The productivity and deployment advantages of small models cannot be underestimated.

Core content:
1. The parameter scale and performance advantages of the Qwen3 series of models
2. Introduction to the local deployment and usage of the Qwen3 small model
3. Application cases of the Qwen3 small model in document summarization and text processing

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

At the end of April, Alibaba released the qwen3 series of models, which sparked heated discussions on the global Internet and once again proved Alibaba's appeal in the open source model community.

This time, qwen3 released 6 open source models with different volume parameters.

0.6b, 1.7b, 4b, 8b, 14b, 32b, 30bA14b, 235bA22b

Among them, the small-volume model is particularly eye-catching. The 0.6b parameter qwen3 is only one thousandth of the volume of the DeepSeek R1 model.

Such a small size makes many people full of expectations for it, but also with doubts.

Anyway, the size is very small, and the deployment and testing costs are very low, so make your hands dirty and give it a try.

In this article:

  • Let’s first talk about the advantages of small models in general

  • Talk about deployment and usage

  • Test typical usage scenarios




Why is the Qwen3 small model productive?


Let’s first look at the outstanding advantages of the qwen small model summarized by users.

  • Very efficient.

    • Version 0.6b, reaching 170 tokens/s on a MacBook m4 chip computer,

    • And version 8b reaches 33 tokens/s

  • The reasoning model, although a small model, also has the ability to "think carefully"

  • Freely switch modes, freely switch between reasoning and non-reasoning models according to the characteristics of the task

  • Multi-language support, supports 119 languages

  • Open source, free for commercial and personal use

For such a small and comprehensive model, it can also play a huge role in the appropriate use scenario;



Local deployment and use


I recommend using Ollama for local deployment on a PC or Mac.

Install Ollama first and enter the official website

https://ollama.com/

Download the version suitable for your computer. After installation, download the corresponding Qwen3 model

ollama run qwen3 # Model download qwen3:8b parameter model
ollama run qwen3:0.6b # 0.6b parameter model

After running the above command, there are several ways to use the model for Q&A:

  • Enter questions directly on the command line. This method allows you to ask questions continuously and provide a memory function for the session. This method is the most direct way to test.

  • Use it in the client, such as configuring a local model in Cherry Studio. This is the recommended testing method.

  • Use http requests to conduct conversations. This method requires you to manage the session yourself.

  • Use frameworks such as langchain for dialogue, which is suitable for intelligent agent development

The specific details are not expanded in this article.

After deployment, we will take a closer look at the scenarios where small models have great uses. The following mainly tests the 0.6b and 8b versions of the qwen3 model;



Summarize the document contents


The small models of the qwen3 series are excellent in summarizing the documents.

Many people worry that the small LLM model will not be able to accurately achieve the desired effect in summarizing documents, or that serious hallucinations may occur.

In response to this, an organization on GitHub conducted tests and comparisons on hundreds of open source and private models on the market.

I summarized the comparison with the qwen small model, as shown in the figure below. You will find that the accuracy of the 8b model is better than DeepSeek-R1

Measured MCP+ summary text


For simple text summarization, I believe that there is no problem with the models of both parameters. Since we need to summarize the document, it is very likely that we need to summarize the content of the web page. After comparison, we found that:

  • 8B is more accurate than 0.6B in calling MCP tool

  • 8B summarizes more in-depth content than 0.6B (or, it can better understand the meaning of the article)




Zero-shot or few-shot classification


Sample classification task, GPT classification test shared by my OpenAI engineer

https://cookbook.openai.com/examples/leveraging_model_distillation_to_fine-tune_a_model

The task mentioned in the article: Infer the variety based on the description of the red wine.

The effect of testing: the difference between the two models

  • The 0.6b model executes 6 times faster than the 8b model

  • 8b is much more accurate than 0.6b (56% accuracy vs 10%)

  • The accuracy of gpt-4o and gpt-4o-mini reached 81.80% and 61% respectively.

Considering the cost situation, the 8b model is quite competitive.

Of course, fine-tuning/distilling 0.6b may significantly improve the accuracy, which can be tested in the next article.

The following is the test code, you can run it yourself:

Remember to download the data from kaggle first https://www.kaggle.com/datasets/zynicide/wine-reviews

from ollama import chatimport pandas as pdimport numpy as npfrom tqdm import tqdmfrom pydantic import BaseModel, Fielddf = pd.read_csv('wine-reviews/winemag-data-130k-v2.csv')df_country = df[df['country']=='Italy']# Only take part of the data for testing varieties_less_than_five_list = ( df_country["variety"] .value_counts()[df_country["variety"].value_counts() < 5] .index.tolist())df_country = df_country[~df_country['variety'].isin(varieties_less_than_five_list)]df_country_subset = df_country.sample(n=500)class WineVariety(BaseModel): variety: str = Field(enum=varieties.tolist())def generate_prompt(row, varieties): # Format the varieties list as a comma-separated string variety_list = ", ".join(varieties) prompt = f""" Based on this review, guess the possible grape variety: This wine was made by {row['winery']} in {row['country']}, {row['province']}. It was produced in {row['region_1']} and described as "{row['description']}". It was rated by {row['taster_name']} and scored {row['points']}. Possible choices are: {variety_list}. Guess the most likely grape variety? Just answer with the variety name or a variety name from the list. """ return promptdef call_model(model, prompt): response = chat( model=model, messages=[ { 'role': 'system', 'content': system_prompt }, { 'role': 'user', 'content': prompt } ], format=WineVariety.model_json_schema(), ) vine_variety = WineVariety.model_validate_json(response.message.content) return vine_variety.varietydef process_example(index, row, model, df, progress_bar): global progress_index try: # Generate the prompt using the row prompt = generate_prompt(row, varieties) df.at[index, model + "-variety"] = call_model(model, prompt) # Update the progress bar progress_bar.update(1) progress_index += 1 except Exception as e: print(f"Error processing model {model}: {str(e)}")def process_dataframe(df, model): global progress_index progress_index = 1 # Reset progress index # Create a tqdm progress bar with tqdm(total=len(df), desc="Processing rows") as progress_bar: # Process each example sequentially for index, row in df.iterrows(): try: process_example(index, row, model, df, progress_bar) except Exception as e: print(f"Error processing example: {str(e)}") return dfdf_country_subset = process_dataframe(df_country_subset, 'qwen3:0.6b')df_country_subset = process_dataframe(df_country_subset, 'qwen3')for model in models: print(f'{model} Accuracy: {get_accuracy(model, df_country_subset) * 100:.2f}%')




translate


How to translate "急急如律令" in Nezha 2 has caused some discussion. Let's see how two small models translate it.

It can be seen that both translations are correct, but the 8b model is much stronger in terms of knowledge, so more content will be considered in the translation

Minority language translation

Qwen3 supports 129 languages. I just encountered the problem of lyrics of music in a minority language at work, so I used it to test it.

Aylonayin 

Yuragimdagi cho'g-cho'g, 

Tunlari uyqum yo'q-yo'q, 

Bedorman-ey, hey bejonman-ey. 

Feeling bir bo'ying yo'q-yo'q, 

Qilasan doim do'q-do'q, 

Sezmaysan-ey, hey bilmaysan-ey. 

Yuragimda atalgan muhabbat, 

Faqat senga, faqat senga. 

Naqorat: 

Aylonayin, ho aylonayin, 

Belingga belbog' bo'lib boylanoyin. 

Aylonayin, ho aylonayin, 

Yurgan yo'llaringdan-ey aylonayin. 

I'm so tired, 

Mehringga yurak zor-zor, 

Qiynaysan-ey, hech bilmaysan-ey. 

Be careful, 

Sensiz bu dunyo tor-tor, 

Ko'rmaysan-ey, hech sezmaysan-ey. 

Yuragimda atalgan muhabbat, 

Faqat senga, faqat senga. 

Naqorat: 

Aylonayin, ho aylonayin, 

Belingga belbog' bo'lib boylanoyin. 

Aylonayin, ho aylonayin,

I don't know what language these lyrics are in, let's take a look at the translation performance of 0.6b and 8b:

Obviously, in the slightly more professional field of small languages:

  • The effect of 8b is stunning, while 0.6b is hallucinatory

  • He can translate in context, and even see that he is pursuing the translation goal of "faithfulness, expressiveness and elegance".




Summarize

  • qwen3-0.6b and qwen3-8b run locally without any pressure and very fast, which means that the cost of testing and verification is very low.

  • In non-professional fields (where knowledge reserve is not high), 0.6b can also be used

  • 8b may be more effective in specialized niche areas and can achieve a good balance between cost and effect.

  • Both reasoning modes can play a good role in thinking, but the amount of knowledge of 0.6b limits the reasoning effect.

It is recommended that in enterprise projects, if there are AI scenarios (or classification, text extraction, etc.) that require response speed, you should first quickly test it with the qwen3-0.6b model.

Later, I will test small models based on other actual scenarios.

#Qwen3 #smallmodel #mcp #AItranslation #AIsummary #AIillusion #LLMlocaldeployment