Understanding pre-training, fine-tuning, and contextual learning in one article

Master the core skills of deep learning model training.
Core content:
1. The concepts and differences of pre-training, fine-tuning and contextual learning
2. Application examples of these three technologies in large model training
3. The role and impact of unsupervised learning and supervised learning in model training
Let's first look at a diagram that introduces the relationship between the three. In a broad sense, training includes pre-training, fine-tuning, and contextual learning. In a narrow sense, training refers specifically to the pre-training stage. The well-known large model companies Zhipu AI and Baichuan Intelligence provide models such as GLM-130B and Baichuan-2-192K, which are all pre-trained models. What we usually call alchemy generally refers to pre-training.
Pre-training: Building a general feature extractor
Pre-training (PT) refers to training a model on a large amount of data so that the model can learn a common feature representation. This training method can enable the model to have better generalization ability on specific tasks. Pre-training usually adopts unsupervised learning methods such as autoencoders, generative adversarial networks, etc. In large models, pre-training can significantly improve the performance of the model and reduce training time and computing resources. During pre-training, the model is exposed to a large amount of unlabeled text data, such as books, articles, and websites. Train a language model on a large amount of unlabeled text data. For example, a language model like GPT-3 is pre-trained on a dataset containing millions of books, articles, and websites. The goal of pre-training is to capture the underlying patterns, structures, and semantic knowledge present in the text corpus.
Fine-tuning : mission-specific optimization tool
Fine- tuning (FT) is to further train a pre-trained model for a specific task. Through fine-tuning, the model can achieve better performance on a specific task. Fine-tuning usually uses supervised learning methods to adjust model parameters using labeled data. In large models, fine-tuning can make full use of the general feature representation of the pre-trained model to achieve more efficient task adaptability.
Supervised learning is a method used in the field of machine learning where the model learns by using training data with known labels. In supervised learning, we have an input data set and a corresponding label or target output data set. The model tries to predict the label of new data by learning the relationship between the input and output. This method is called "supervised" because the model is supervised by the labels in the training data during the learning process. It relies on labeled training data, which is usually manually classified or annotated.
Supervised learning is useful in many tasks, such as classification, regression, speech recognition, image recognition, etc. Common supervised learning algorithms include support vector machines, decision trees, logistic regression, etc. Its goal is to train a model that can predict the corresponding label or output when given new unlabeled data.
SFT (Supervised Fine-Tuning) is a common supervised learning method . It is based on a pre-trained model and uses labeled data to fine-tune it to adapt to specific tasks. Through fine-tuning, the model can use pre-learned knowledge and quickly adapt to new tasks to improve performance and results. This method is widely used in various tasks, such as classification, regression, etc., and has achieved remarkable success.
Contextual learning: capturing intrinsic relationships in data
The pre-trained GPT-3 model has a magical ability, which was later called: In-Context Learning, also known as situational learning .
This capability simply means that the pre-trained GPT-3 model does not need to be retrained when migrating to a new task. Instead, it only needs to provide a task description (this task description is optional) and then provide a few examples (task queries and corresponding answers, organized in pairs), and finally add the query that the model wants to answer. The above content is packaged together as the input of the model, and the model can correctly output the answer corresponding to the last query.
For example, let's say we want to use GPT-3 to do a translation task, translating English into French. The input format is as follows: