Understanding the similarities and differences between fine-tuning technology Lora and SFT in one article

Written by
Silas Grey
Updated on:June-20th-2025
Recommendation

Get a deeper understanding of the new perspective of AI fine-tuning technology and explore the essence of Lora and SFT.

Core content:
1. SFT: How to improve model domain expertise through supervised fine-tuning
2. Lora: Technical principles and advantages of low-rank adaptive optimization model
3. Comparative analysis of Lora and SFT in data dependence and computing resources

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
   SFT: Customize your own AI assistant

SFT ( Supervised Fine-Tuning ) is supervised fine-tuning. To help you understand, let's tell a story first.

Suppose you have a very smart friend who knows everything from astronomy to geography. This is like a large language model that has been pre-trained on a large scale and has very strong general language capabilities. However, when you ask him some very professional questions, such as how to manufacture high-precision chips, he may not be able to answer very well.

What should we do at this time? We need to "train" him, that is, supervised fine-tuning . We collect a lot of professional questions about chip manufacturing and the corresponding accurate answers, and then let this friend learn these questions and answers. After a period of study, his ability to answer questions in the professional field of chip manufacturing will be greatly improved.

The same is true for the SFT of large language models. Although pre-trained models are powerful, they are often not accurate enough when facing tasks in specific fields. Therefore, we prepare a large amount of labeled data in this field. This data is like "learning materials" for the model, which contains a clear correspondence between input and expected output. By learning this data, the model constantly adjusts its internal parameters, just like students consolidate their knowledge by doing exercises, so that it can perform better on specific tasks.

For example, if we want a large language model to become an expert in the medical field, we can collect a large amount of medical cases, medical knowledge questions and answers, and perform SFT on the model. In this way, it can more accurately answer medical professional questions such as disease diagnosis and treatment plans.

   Lora: A cost-conscious model optimizer

Lora ( Low-Rank Adaptation ), low-rank adaptation. The name sounds a bit complicated, so let's understand it through an example.

Imagine that you want to build a huge castle of blocks. This castle is like a large language model, and each block is a parameter of the model. The traditional fine-tuning method is like tearing down the entire castle and rebuilding it, which not only consumes a lot of time and energy, but also requires a lot of blocks (computing resources).

Lora, on the other hand, seems to have found a more clever way. It discovered that in some parts of the castle, it is not necessary to adjust all the building blocks. It only needs to add some small building blocks with specific structures (newly added low-rank matrices) next to them, and only adjust these small building blocks to achieve the same effect as rebuilding the castle.

Specifically, Lora adds two low-rank matrices next to certain layers of the model without changing most of the parameters of the pre-trained model . These two low-rank matrices are like "helpers" for the model. When the model is operating, they will work with the original parameter matrix. During the training process, only the parameters of these two low-rank matrices need to be trained, without changing a large number of parameters of the original model. In this way, the number of parameters that need to be trained is greatly reduced, the training speed will be accelerated, and a lot of computing resources can be saved, just like using fewer building blocks and less time to build a castle with similar functions.

   The difference between Lora and SFT
Data dependency:
  • SFT relies heavily on a large amount of labeled data, just like students need a lot of practice questions to improve their grades. Only with enough high-quality domain-specific data can SFT enable the model to perform well in that domain.
  • Lora is relatively less dependent on data. It mainly optimizes the model through clever parameter adjustment. Even if the amount of data is not particularly large, it can improve the performance of the model on specific tasks to a certain extent.
Parameter adjustment method:
  • SFT adjusts some parameters of the model and allows the model to learn task-related patterns and knowledge by training it on data from specific tasks.
  • Lora adapts to specific tasks by adding additional low-rank matrices and only training the parameters of these low-rank matrices. Most of the parameters of the original model are frozen.
Resource requirements:
  • Since SFT needs to process a large amount of data and make frequent adjustments to model parameters, it has high demands on computing resources, just like rebuilding a castle requires a lot of manpower and materials.
  • Because Lora only trains a small number of low-rank matrix parameters, it greatly reduces the amount of computation and memory usage, and has relatively low demand for computing resources, just like building a castle with a simple method that saves more resources.

Lora and SFT are like two magicians in the AI ​​world. They each perform unique magic to enable large language models to serve us better. Whether you want to make the model more proficient in a professional field or optimize model performance with limited resources, they can play a huge role. With the continuous development of technology, I believe that Lora and SFT will create more value in post-training in the future AI world.