Fine-Tuning Qwen3 with Unsloth: Building an AI Brain Teaser Pro

Written by

Caleb Hayes

Updated on:June-15th-2025

Today's Unsloth is more suitable for fine-tuning in scenarios with limited hardware resources, and it saves more GPU memory than LLama-Factory.

1. Environmental Preparation

1) Purchase an AutoDL cloud host (I chose it here because it is very cost-effective and suitable for novices to do experiments). Choose a 3090 graphics card (if you have a GPU machine locally, please use your own). When I purchased AutoDL, I chose PyTorch

2) Install Anaconda (miniconda3 is installed by default on AutoDL)

Anacoda official website: https://www.anaconda.com/

Download the corresponding version according to your own system

Once the installation is complete, open a terminal (Linux/macOS) or Anaconda Prompt (Windows) and enter the following command to create a new environment:

( The following operations need to be performed on AutoDL )

conda create -n llama_factory python=3.10 conda activate llama_factory

3) Install cuda (AutoDL is already installed)

Reference: https://help.aliyun.com/zh/egs/user-guide/install-a-gpu-driver-on-a-gpu-accelerated-compute-optimized-linux-instance

4) Download the dataset

The data set is a very important part of fine-tuning. The quality of the data set directly determines the effect of your fine-tuning. In this experiment, I used a data set about brain teasers, address:

https://modelscope.cn/datasets/helloworld0/Brain_teasers

2. Install Unsloth

1) Use conda to create a virtual environment (if you don’t have jupyter enabled, you need to do this step)

conda create -n unsloth_env python=3.10conda activate unsloth_env2) Install Unsloth

pip install unsloth

3. Download the Qwen3 large language model

I used the Qwen3-4B version for this fine-tuning, which has relatively few parameters and better results. First, use pip to install the modelscope module

pip install modelscope then create the directory and download the model:

mkdir -p /models/modelscope download --model Qwen/Qwen3-4B --local_dir /models/Qwen3-4B Note: The Qwen3-4B large language model will be downloaded to /models/Qwen3-4B

4. Test before fine-tuning

Before fine-tuning, you can load the initial model for inference testing and write a test script befor_train.py with the following content:

from unsloth import FastLanguageModel
model_name =  "/models/Qwen3-4B"   # Replace with the actual model pathmax_seq_length = 2048   # Maximum context lengthdtype = None   # Automatically select float16 or bfloat16load_in_4bit = True   # Enable 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(    model_name=model_name,    max_seq_length=max_seq_length,    dtype=dtype,    load_in_4bit=load_in_4bit,)
FastLanguageModel.for_inference(model)inputs = tokenizer(    [ "Instruction: You are a brain teaser expert, please answer my question: What is something that no one is willing to resist no matter how strong it is?" ], return_tensors= "pt").to( "cuda" )outputs = model.generate(**inputs, max_new_tokens=256)print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: It will take a long time to load the model for the first time. Please wait patiently to see if there are any differences between its output and the answers in the dataset.

5. Start fine-tuning

Write the fine-tuning script train.py, the content is as follows:

from unsloth import FastLanguageModelfrom trl import SFTTrainerfrom transformers import TrainingArgumentsfrom datasets import load_datasetimport torch# Load model model_name = "/models/Qwen3-4B" max_seq_length = 2048dtype = Noneload_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit,)# Configuration LoRAmodel = FastLanguageModel.get_peft_model( model, r=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=64, lora_dropout=0.2, bias="none", use_gradient_checkpointing=True, random_state=3407,)# Load and preprocess the dataset dataset = load_dataset("json", data_files="/models/datasets/data.json", split="train")train_prompt_style = """The following is a brain teaser question. Please provide a suitable answer without providing a thinking process. ### Instructions: You are a brain teaser expert. Please answer the following questions without providing a thinking process. ### Question: {} ### Reply: {}"""def formatting_prompts_func(examples, eos_token): inputs = examples["instruction"] outputs = examples["output"] texts = [] for inputs, outputs in zip(inputs, outputs): text = train_prompt_style.format(inputs, outputs) + eos_token # eos token must be added during training        texts.append(text) return { "text": texts, }dataset = dataset.map( formatting_prompts_func, batched=True, fn_kwargs={'eos_token': tokenizer.eos_token}, # tokenizer is the loaded tokenizer for the previously loaded model) # Configure training trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, args=TrainingArguments( per_device_train_batch_size=8, gradient_accumulation_steps=4, warmup_steps=10, max_steps=80, learning_rate=5e-5, fp16=not torch.cuda.is_bf16_supported(), bf16=torch.cuda.is_bf16_supported(), logging_steps=5, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=3407, output_dir="outputs", ),)# Start training trainer.train()## Save LoRA adapter model.save_pretrained("qwen3_lora_finetuned") tokenizer.save_pretrained("qwen3_lora_finetuned")## Save new model model.save_pretrained_merged("/models/Qwen3-4B-Aminglinux", tokenizer, save_method="merged_16bit")

Note: The dataset file path is: /models/datasets/data.json. There are many fine-tuning parameters involved in the script. If you want a detailed parameter introduction, you can leave a message at the end of the article. I will not introduce it in detail here.

Fine-tuning will be time-consuming, mainly depending on your hardware configuration and the max_steps parameter you set in the script. The larger this value is, the longer it will take to train. The fine-tuned model path is: /models/Qwen3-4B-Aminglinux

6. Reasoning test after fine-tuning

Write the test script after_train.py with the following content:

from  unsloth  import  FastLanguageModel
max_seq_length =  2048dtype =  Noneload_in_4bit =  False   ##If the video memory is sufficient, set this to False
model, tokenizer = FastLanguageModel.from_pretrained(    model_name= "/models/Qwen3-4B-Aminglinux" ,    max_seq_length=max_seq_length,    dtype=dtype,    load_in_4bit=load_in_4bit,)
FastLanguageModel.for_inference(model)inputs = tokenizer(    [ "Instruction: You are a brain teaser expert, please answer my question: What is something that no one is willing to resist no matter how strong it is?" ], return_tensors= "pt").to( "cuda" )outputs = model.generate(**inputs, max_new_tokens= 256 )print (tokenizer.decode(outputs[ 0 ], skip_special_tokens= True ))

Note: model_name is set to the path of the fine-tuned large language model.