Want to "specialize" a large model for a specific task? Don't miss these 3 Addition-Based fine-tuning methods

Written by
Silas Grey
Updated on:June-27th-2025
Recommendation

Master the cutting-edge technology of large model fine-tuning to improve the performance of specific tasks.

Core content:
1. The necessity and basic principles of large model fine-tuning
2. Analysis of the limitations of traditional fine-tuning methods
3. Advantages and practice of Addition-Based fine-tuning method

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Existing large models can do many complex things, such as writing articles, answering questions, generating images, etc. However, although large models are very powerful, sometimes they need further adjustment to perform better on some specific tasks. Today, let's talk about a method of large model fine-tuning - adding additional parameters (Addition-Based) fine-tuning.

1. Big Models and Fine-tuning

1. Large Model: A powerful all-rounder that needs to be customized

In simple terms, a large model is an artificial intelligence model with massive parameters and powerful learning capabilities. For example, DeepSeek and GPT-4, which are familiar to everyone, are typical representatives of large models. These large models are like an "all-round academic master" and have good performance in many aspects. However, when we need it to be more professional in a specific field, such as medical diagnosis, legal document processing, etc., it needs "special training", and the process of "special training" is fine-tuning.

2. Fine-tuning: Making large models “specialized”

When training, the big model uses a huge amount of data and learns a lot of general knowledge and skills. However, different application scenarios have different requirements, just like an athlete who has good physical fitness but needs to participate in different events and undergo special training for the event. Fine-tuning is to give the big model "special training" to make it perform better in specific tasks. For example, a big model that can already generate general text can be fine-tuned to be better at generating professional text in the medical field.

2. The “troubles” of traditional fine-tuning methods

Before we talk about Addition-Based fine-tuning, let's first look at the traditional fine-tuning methods. Common ones include full fine-tuning and partial fine-tuning.

1. Full-scale fine-tuning: “breaking bones” and high cost

Full fine-tuning is to adjust all parameters of a large model. The advantage of this method is that the adjustment is relatively comprehensive, and in theory, the model can perform well on specific tasks. However, its disadvantage is also obvious. The number of parameters of a large model is generally very large, often billions, tens of billions, or even hundreds of billions.

Adjusting so many parameters requires a lot of computing resources and time, which is very costly. Moreover, the large model itself has learned a lot of general knowledge, and fine-tuning the entire model may destroy some of its original capabilities, which feels a bit like "breaking bones".

(II) Partial fine-tuning: “Cutting corners” has limitations

Partial fine-tuning is to adjust only some parameters in a large model, such as adjusting only the parameters of the last few layers. The advantage of this is that it saves computing resources and time, and the cost is relatively low. However, it also has limitations. If only some parameters are adjusted, the model may not be able to fully learn the knowledge of a specific task, and the effect may not be good enough. It is like only training an athlete for some skills, and it may be difficult for him to perform at his best in the competition.

3. Addition-Based Fine-tuning: Adding Equipment to the Model

1. What is Addition-Based Fine-tuning?

Addition-Based fine-tuning, in simple terms, is to add additional parameters to the original parameters of the large model, and then adjust these additional parameters, while most of the original parameters remain unchanged. This is like "adding equipment" to the large model. The original model is like a warrior, which already has strong basic capabilities. Now add some new equipment to it, and by adjusting these equipment, it can perform better in a specific battle.

2. How to add equipment?

In actual operation, Addition-Based fine-tuning has derived a variety of specific implementation solutions, among which the most representative are Prefix Tuning, Prompt Tuning and Adapter Tuning. These three methods are like designing different styles of "equipment" for a large model. Although their appearance and functions have different focuses, the core idea is "adding parameters + targeted optimization".

1. Prefix Tuning: Give the model a good start

Principle : Imagine you ask a writer to write a novel. If you let him write freely, the topic may go off track. But if you give him a story beginning (such as "On a rainy night, Detective Xiao Ming received a mysterious phone call"), he can develop in this direction. Prefix Tuning is to add a trainable continuous prefix vector to the model input to tell the model "what task to handle next" while keeping the original model parameters frozen.

Implementation method : Insert a segment of length before the input layer or hidden layer of the model.kVirtual tokens (for example, each token in the GPT series models corresponds to a vector). The vectors of these tokens are not fixed, but are "task-specific prefixes" learned through training.

Specific example : If you run a flower shop, you want the big model to help you write a WeChat article about flower care knowledge. The normal input may be just "how to care for roses". After Prefix Tuning, a trained prefix vector will be added in front. This vector is like telling the model: "Next, you will create a WeChat article for ordinary flower lovers. The language should be lively and easy to understand. The theme is rose care. First, introduce the habits of roses, and then talk in detail about watering, fertilizing, pruning and other maintenance points." Based on this prefix, the model can generate an article that meets the needs of the flower shop, from the suitable growth environment for roses to the watering frequency in different seasons, and it will also be accompanied by some interesting maintenance tips to attract readers to read.

The advantage is that only the prefix parameters need to be optimized (usually about 0.1%-1% of the original model parameters), and the training cost is extremely low.

2. Prompt Tuning: Make prompt words smarter

Principle : When people usually use DeepSeek, they will enter prompts such as "Please help me write a job application letter". Prompt Tuning makes these prompts more "intelligent". However, it does not optimize discrete text, but converts prompts into optimal task instructions in a continuous vector space, which is essentially "customizing test questions" for the model.

Implementation :

Traditional prompts are fixed texts designed manually (such as "The following is a math problem. The answer is marked with {}"). Prompt Tuning converts these prompts into continuous vectors that the model can understand (called "soft prompts") and optimizes the values ​​of these vectors through backpropagation.

Specific example : In the education field, when the model needs to solve the math problem "calculate 25×(3+7)", Prompt Tuning will add a set of trained vectors before the input to guide the model to generate detailed problem-solving steps: "First calculate the addition in the brackets 3+7=10, then multiply the result by 25 to get 25×10=250", instead of directly giving the answer.

Compared with manually designed prompt words, this method can automatically find the "invisible instructions" that best suit a specific task. It is especially suitable for complex tasks (such as multi-step reasoning). However, attention should be paid to the length and position of the prompt words. If the prompt words are too long, they may interfere with the original logic of the model.

3. Adapter Tuning: Add “plug-ins” to the model

Principle : If the large model is compared to a multi-functional computer, Adapter Tuning is to install an "external graphics card" for specific tasks in it - inserting a small neural network (called Adapter) between certain layers of the model, and only training the parameters of these Adapters, while the original model parameters remain unchanged.

Implementation :

Taking the Transformer architecture as an example, a "bottleneck structure" is usually added after the attention mechanism or feedforward network of each layer (for example, compressing the dimension first and then restoring it). This bottleneck structure is the Adapter, which contains a small number of trainable parameters.

Specific example : Suppose you operate a food recommendation platform, and you want the big model to recommend dishes based on user taste preferences and generate menus based on the inventory of ingredients. For the task of recommending dishes, add an Adapter to the big model. This Adapter will convert the taste information entered by the user, such as "like sweet and sour taste, prefer seafood", into a feature vector that the model can understand, allowing the model to accurately recommend dishes such as sweet and sour fish and pineapple pork. When switching to the task of generating a menu, another Adapter will read the ingredient inventory data, such as "available potatoes, beef, onions", and convert it into information that the model can process, and output a menu containing dishes such as beef stew with potatoes and beef fried with onions. These two Adapters are like professional plug-ins for different tasks on a computer. They do not interfere with each other and can be switched at any time.

The advantage of this method is "plug and play". The same large model can handle multiple tasks by loading different Adapters, and the memory usage is extremely low (a single Adapter usually only occupies 0.1%-1% of the original model parameters). Google's T5 model uses Adapter to achieve efficient switching of multi-language translation tasks.

3. Differences from traditional methods

The difference between these three Addition-Based methods and traditional fine-tuning is like the different strategies of "upgrading a computer": full fine-tuning is to disassemble and reinstall the entire computer (time-consuming and costly), partial fine-tuning is to replace only the graphics card without replacing the motherboard (limited effect), and the Addition-Based method is to add dedicated peripherals to the computer (such as graphics cards, game controllers) - without destroying the original configuration, but can improve specific performance in a targeted manner.

For example, we can also handle Chinese medical question-answering tasks:

Full fine-tuning requires training 100 billion parameters, takes days, and may cause the model to forget its ability to have English conversations;

Adapter Tuning only requires training 1 billion adapter parameters and can be completed within 1 day, without affecting the original model's English and general question-answering capabilities.

Prefix Tuning is more lightweight and only requires training of 300 million prefix parameters, making it suitable for quickly launching small-scale medical consultation robots.

IV. Advantages of Addition-Based Fine-tuning: Low Cost, High Gains

1. Resource saving and high cost performance

Large models have a huge number of parameters, and full fine-tuning requires a lot of computing resources, which is a considerable expense for many companies and individuals. However, Addition-Based fine-tuning only requires adding a small number of additional parameters, and the computing resources and time required to adjust these parameters are much less.

2. Keep the “original intention” and do not lose the ability

During training, the large model learns a lot of general knowledge and skills, which are its "basic skills". If full fine-tuning is used, it may forget these "basic skills" when adjusting specific tasks. However, because most of the original parameters remain unchanged, the original general capabilities of the large model are retained.

(III) Flexible adaptation, easy handling of multiple tasks

Different tasks have different requirements. Addition-Based fine-tuning can make large models adapt to various tasks by adding different additional parameters. When you need to switch tasks, you only need to adjust the corresponding additional parameters. There is no need to make major changes to the entire model. It is very flexible.

5. Challenges of Addition-Based Fine-tuning

1. The design of additional parameters is critical

How to design these additional parameters to make them better suited to specific tasks? This is a question that needs to be studied. If the additional parameters are not designed properly, the model may not be able to learn the knowledge of the task well, and the effect will be greatly reduced. For example, if the structure of the added new layer is not appropriate, or the parameter is inserted in the wrong position, it may affect the performance of the model.

2. Model fusion requires skills

The newly added additional parameters and the original large model parameters need to be well integrated to make the overall performance of the model good. If the integration is not good, there may be a situation of "not adapting to the local environment", such as the information of the newly added parameters cannot be well transmitted to the original model, or the information of the original model interferes with the newly added parameters. This requires technical optimization to find a suitable integration method.

3. Difficulty in effect evaluation

How to evaluate the effect of the model after Addition-Based fine-tuning? Because it involves the combined effect of new parameters and original parameters, the evaluation indicators may need to consider multiple aspects comprehensively. Moreover, different tasks have different evaluation criteria, and it is necessary to develop a suitable evaluation method to accurately judge the effect of fine-tuning.

VI. Conclusion

Addition-Based fine-tuning is to add additional parameters to the original parameters of the large model for adjustment. It has the advantages of saving resources, retaining the original capabilities of the model, and flexibly adapting to multiple tasks. It is suitable for scenarios such as small data volumes, rapid deployment, and multi-task switching. Although it also faces challenges such as additional parameter design, model fusion, and effect evaluation, it has a very broad future prospect.

By understanding this fine-tuning method, we can better understand how large models work in different scenarios, and we can also see that artificial intelligence technology is constantly optimizing and improving. I hope everyone has a clearer understanding of the fine-tuning of large models, and I also look forward to artificial intelligence bringing us more surprises in the future.