Fine-Tuning Qwen3 with Unsloth: Building an AI Brain Teaser Pro

Fine-tune the Qwen3 large language model and become a brain teaser expert. Unsloth helps you save GPU memory efficiently.
Core content:
1. Environment preparation: purchase a cloud host, install Anaconda and CUDA
2. Install UnSloth and create a virtual environment
3. Download the Qwen3-4B large language model and test it before fine-tuning
1. Environmental Preparation
1) Purchase an AutoDL cloud host (I chose it here because it is very cost-effective and suitable for novices to do experiments). Choose a 3090 graphics card (if you have a GPU machine locally, please use your own). When I purchased AutoDL, I chose PyTorch
2) Install Anaconda (miniconda3 is installed by default on AutoDL)
Anacoda official website: https://www.anaconda.com/
Download the corresponding version according to your own system
Once the installation is complete, open a terminal (Linux/macOS) or Anaconda Prompt (Windows) and enter the following command to create a new environment:
( The following operations need to be performed on AutoDL )
conda create -n llama_factory python=3.10 conda activate llama_factory
3) Install cuda (AutoDL is already installed)
Reference: https://help.aliyun.com/zh/egs/user-guide/install-a-gpu-driver-on-a-gpu-accelerated-compute-optimized-linux-instance
4) Download the dataset
The data set is a very important part of fine-tuning. The quality of the data set directly determines the effect of your fine-tuning. In this experiment, I used a data set about brain teasers, address:
https://modelscope.cn/datasets/helloworld0/Brain_teasers
2. Install Unsloth
1) Use conda to create a virtual environment (if you don’t have jupyter enabled, you need to do this step)
conda create -n unsloth_env python=3.10conda activate unsloth_env2) Install Unsloth
pip install unsloth
3. Download the Qwen3 large language model
I used the Qwen3-4B version for this fine-tuning, which has relatively few parameters and better results. First, use pip to install the modelscope module
pip install modelscope then create the directory and download the model:
mkdir -p /models/modelscope download --model Qwen/Qwen3-4B --local_dir /models/Qwen3-4B Note: The Qwen3-4B large language model will be downloaded to /models/Qwen3-4B
4. Test before fine-tuning
Before fine-tuning, you can load the initial model for inference testing and write a test script befor_train.py with the following content:
from unsloth import FastLanguageModel
model_name = "/models/Qwen3-4B" # Replace with the actual model path
max_seq_length = 2048 # Maximum context length
dtype = None # Automatically select float16 or bfloat16
load_in_4bit = True # Enable 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[ "Instruction: You are a brain teaser expert, please answer my question: What is something that no one is willing to resist no matter how strong it is?" ], return_tensors= "pt"
).to( "cuda" )
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: It will take a long time to load the model for the first time. Please wait patiently to see if there are any differences between its output and the answers in the dataset.
5. Start fine-tuning
Write the fine-tuning script train.py, the content is as follows:
from unsloth import FastLanguageModelfrom trl import SFTTrainerfrom transformers import TrainingArgumentsfrom datasets import load_datasetimport torch# Load model model_name = "/models/Qwen3-4B" max_seq_length = 2048dtype = Noneload_in_4bit = Truemodel, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit,)# Configuration LoRAmodel = FastLanguageModel.get_peft_model( model, r=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=64, lora_dropout=0.2, bias="none", use_gradient_checkpointing=True, random_state=3407,)# Load and preprocess the dataset dataset = load_dataset("json", data_files="/models/datasets/data.json", split="train")train_prompt_style = """The following is a brain teaser question. Please provide a suitable answer without providing a thinking process. ### Instructions: You are a brain teaser expert. Please answer the following questions without providing a thinking process. ### Question: {} ### Reply: {}"""def formatting_prompts_func(examples, eos_token): inputs = examples["instruction"] outputs = examples["output"] texts = [] for inputs, outputs in zip(inputs, outputs): text = train_prompt_style.format(inputs, outputs) + eos_token # eos token must be added during training texts.append(text) return { "text": texts, }dataset = dataset.map( formatting_prompts_func, batched=True, fn_kwargs={'eos_token': tokenizer.eos_token}, # tokenizer is the loaded tokenizer for the previously loaded model) # Configure training trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, args=TrainingArguments( per_device_train_batch_size=8, gradient_accumulation_steps=4, warmup_steps=10, max_steps=80, learning_rate=5e-5, fp16=not torch.cuda.is_bf16_supported(), bf16=torch.cuda.is_bf16_supported(), logging_steps=5, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=3407, output_dir="outputs", ),)# Start training trainer.train()## Save LoRA adapter model.save_pretrained("qwen3_lora_finetuned") tokenizer.save_pretrained("qwen3_lora_finetuned")## Save new model model.save_pretrained_merged("/models/Qwen3-4B-Aminglinux", tokenizer, save_method="merged_16bit")
Note: The dataset file path is: /models/datasets/data.json. There are many fine-tuning parameters involved in the script. If you want a detailed parameter introduction, you can leave a message at the end of the article. I will not introduce it in detail here.
Fine-tuning will be time-consuming, mainly depending on your hardware configuration and the max_steps parameter you set in the script. The larger this value is, the longer it will take to train. The fine-tuned model path is: /models/Qwen3-4B-Aminglinux
Write the test script after_train.py with the following content:
from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = False ##If the video memory is sufficient, set this to False
model, tokenizer = FastLanguageModel.from_pretrained(
model_name= "/models/Qwen3-4B-Aminglinux" ,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[ "Instruction: You are a brain teaser expert, please answer my question: What is something that no one is willing to resist no matter how strong it is?" ], return_tensors= "pt"
).to( "cuda" )
outputs = model.generate(**inputs, max_new_tokens= 256 )
print (tokenizer.decode(outputs[ 0 ], skip_special_tokens= True ))
Note: model_name is set to the path of the fine-tuned large language model.