Fine-tuning the llama3 model using LLaMA-Factory

Written by
Silas Grey
Updated on:June-27th-2025
Recommendation

A powerful large-scale language model fine-tuning tool that makes model optimization easy.

Core content:
1. Introduction and features of the LLaMA-Factory model
2. Environment configuration and model download steps
3. Practice of model fine-tuning using the LoRA adapter

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

1. Introduction to the LLaMA-Factory model

https://github.com/hiyouga/LLaMA-Factory

LLaMA-Factory is a tool for fine-tuning large language models (LLMs). It aims to simplify the fine-tuning process of large language models, allowing users to quickly train and optimize the model to improve the performance of the model on specific tasks.

This tool supports a variety of pre-trained large language models, such as LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM and Phi.

Features of LLaMA-Factory include:

  1. Supports multiple fine-tuning methods: It integrates multiple fine-tuning methods such as continuous pre-training, supervised fine-tuning (SFT), and preference alignment (RLHF).
  2. Efficient fine-tuning technology: Compared with ChatGLM's official P-Tuning fine-tuning, LLaMA Factory's LoRA fine-tuning provides significant speedup and achieves higher performance scores on specific tasks.
  3. Ease of use: LLaMA-Factory provides a high-level abstract interface, allowing developers to use it out of the box and get started quickly.
  4. WebUI support: Drawing on the Stable Diffusion WebUI, this project provides a gradio-based web version workbench, which allows beginners to quickly get started.
  5. Model export and reasoning: Supports model export and reasoning, including dynamic merging of LoRA models for reasoning.
  6. API Server: Supports starting the API Server so that the trained model can be remotely accessed and called through the network interface.
  7. Benchmark: Provides support for mainstream benchmarks, such as mmlu, cmmlu, ceval, etc., to evaluate the generalization ability of the model.

LLaMA-Factory aims to lower the barrier to fine-tuning large language models, enabling more researchers and developers to use these powerful models to solve specific real-world problems.

1.1 Environment Configuration

Execute in the /mnt/workspace path

git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]

1.2 Download llama3 model

Execute in the /mnt/workspace path

mkdir models
cd models

Execute in the /mnt/workspace/models path

pip install modelscope
git clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3-8B-Instruct.git

2 llama3 Chinese enhanced large model (lora fine-tuning)

Using the pre-trained model + LoRA adapter approach, you can quickly and efficiently adapt the model to new tasks or domains without retraining the entire model, such as adapting the English model to Chinese conversations. This is a common and practical fine-tuning method.

2.1 Model Training

Modify the file /mnt/workspace/LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml

! Upload configuration file
llama3_lora_sft . yaml
### model
model_name_or_path :  /mnt/workspace/models/Meta - Llama - 3 - 8B - Instruct

### method
stage :  sft
do_train : true
finetuning_type :  lora
lora_target :  all

### dataset
dataset :  alpaca_gpt4_zh
template :  llama3
cutoff_len : 2048
max_samples : 1000
overwrite_cache : true
preprocessing_num_workers : 16

### output
output_dir :  /mnt/workspace/models/llama3 - lora - zh
logging_steps : 10
save_steps : 500
plot_loss : true
overwrite_output_dir : true

### train
per_device_train_batch_size : 1
gradient_accumulation_steps : 8
learning_rate : 1.0e-4
num_train_epochs : 3.0
lr_scheduler_type :  cosine
warmup_ratio : 0.1
bf16 : true
ddp_timeout : 180000000

### eval
val_size : 0.1
per_device_eval_batch_size : 1
eval_strategy :  steps
eval_steps : 500

The main parameters in the yaml file:

model

  • model_name_or_path: specifies the path or name of the pre-trained model, here we use

    The Meta-Llama 3.8B model in the /mnt/workspace/models/Meta-Llama-3-8B-Instruct path is used as the base model.

  • method

    • stage: indicates the training stage, here is sft (Supervised Fine-Tuning).
    • do_train: indicates that training is to be performed.
    • finetuning_type: fine-tuning method, here we use LoRA (Low-Rank Adaptation).
    • lora_target: The object that LoRA acts on, here is the query and value mapping matrix of the attention layer.
  • dataset

    • dataset: training dataset name, here we use alpaca_gpt4_zh dataset. (Note: you can select multiple datasets, such as alpaca_zh, alpaca_gpt4_zh, oaast_sft_zh)
    • template: The format template of the dataset, here is llama3.
    • cutoff_len: The maximum length of the input sequence, if it exceeds the limit, it will be truncated. Here it is set to 1024.
    • max_samples: The maximum number of samples taken from the dataset, here the first 1000 samples are taken.
    • preprocessing_num_workers: The number of processes for data preprocessing. Here, 16 processes are used for parallel processing.
  • output

    • output_dir: The path where training logs and models are saved.
    • logging_steps: The number of steps at which the log is recorded. Here, it is recorded every 100 steps.
    • save_steps: How often the model is saved? Here, it is saved every 500 steps.
    • plot_loss: Whether to draw a loss curve.
  • train

    • per_device_train_batch_size: The training batch size on each device, set to 1 here.
    • gradient_accumulation_steps: The number of steps for gradient accumulation. Here, the model parameters are updated once every 8 steps.
    • learning_rate: learning rate, here set to 0.0001.
    • num_train_epochs: the number of epochs to train, here we train 1 epoch. (You can also specify max_steps 3000: training 3000 steps. Note: real project training increases the total number of steps; Note: choose one between num_train_epochs and max_steps)
    • lr_scheduler_type: learning rate scheduler type, here we use cosine, which means warmup first and then cool down.
    • warmup_steps: The number of steps to warmup with the initial learning rate, here it is 10% of the total number of steps. fp16: Whether to use fp16 mixed precision training to speed up training and reduce video memory usage.
  • eval

    • val_size: The proportion of the validation set divided from the training set. Here, 10% is taken as the validation set.
    • per_device_eval_batch_size: The batch size of each device during verification, here it is 1.
    • evaluation_strategy: verification strategy, here it is verified according to the number of steps.
    • eval_steps: How many steps should be used for verification? Here, it is verified every 500 steps.

The above is the main parameter explanation of this yaml configuration file. These parameters set up a training process for fine-tuning the Meta-Llama-3-8BInstruct model on the alpaca_gpt4_zh dataset using LoRA.

2.2 Clone Dataset

Use the following command to clone the alpaca_gpt4_zh dataset in the /mnt/workspace/LLaMA-Factory/data path

git clone https://www.modelscope.cn/datasets/llamafactory/alpaca_gpt4_zh.git

Then, modify the dataset_info.json file

!Upload configuration file
data_info.json

Upload alpaca_gpt4_data_zh.json to the data directory

2.3 Training

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

1.4 Reasoning

Modify examples/inference/llama3_lora_sft.yaml

model_name_or_path: /mnt/workspace/models/Meta-Llama-3-8B-Instruct
adapter_name_or_path: /mnt/workspace/models/llama3-lora-zh
template: llama3
finetuning_type: lora
  • –model_name_or_path /mnt/workspace/models/Meta-Llama-3-8B-Instruct : This parameter specifies the name or path of a pre-trained language model. In this example, the Meta’s Llama-3-8B-Instruct model located at /mnt/workspace/models/MetaLlama-3-8B-Instruct is used. This model will be used as the base model on which fine-tuning will be performed.
  • –adapter_name_or_path /mnt/workspace/models/llama3-lora-zh : This parameter specifies the name or path of the adapter used for fine-tuning. In this example, the adapter located at /mnt/workspace/models/llama3-lora-zh is used. This adapter is fine-tuned on the base model using LoRA technology.
  • –template llama3 : This parameter specifies the name of the dialog template. The template defines the format and style of the dialog. In this example, the template named llama3 is used.
  • –finetuning_type lora : This parameter specifies the type of fine-tuning. In this example, LoRA technology is used for fine-tuning.

Execute in the /mnt/workspace/LLaMA-Factory path

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml