The two core tools of the RAG retrieval system - Embedding model and Rerank model

Written by
Iris Vance
Updated on:June-24th-2025
Recommendation

Explore the core tools of the RAG retrieval system and appreciate the clever use of the Embedding model and the Rerank model.

Core content:
1. The role and function of the Embedding model and the Rerank model in the RAG system
2. The application and difference of the two models in natural language processing and information retrieval
3. Comparison of technical implementation details, including functional goals, application stages and technical characteristics

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 The Embedding and Rerank models are the core models in the RAG system.



In the RAG system, there are two very important models, one is the Embedding model and the other is the Rerank model; these two models play an important role in RAG.


The role of the Embedding model is to vectorize the data and, through dimensionality reduction, make it possible to calculate the similarity between vectors through Euclidean distance, cosine function, etc., so as to perform similarity retrieval.


The role of Rerank is to perform more accurate data screening based on Embedding retrieval; if the Embedding model performs one-dimensional screening, then the Rerank model performs screening from multiple dimensions.





Embedding model and Rerank model




In natural language processing and information retrieval systems, Embedding models and Rerank models are two types of technologies with different functions but are often used in combination.

Both Embedding and Rerank models are neural network models implemented based on deep learning, but due to their different functions, their implementation and training methods are also somewhat different.

From the perspective of usage, Embedding is generally used for data vectorization and fast retrieval, while the Rerank model reorders data based on fast retrieval to improve similarity.

However, from the perspective of technical implementation, the two models use different learning methods and architectures; the reason lies in the implementation purposes and data processing methods of the two models.


The core differences between them lie in their goals, application stages and technical implementations . Here is a detailed comparison:


1. Functional goals

DimensionsEmbedding ModelRerank Model
Core Mission
Convert text into low-dimensional vectors to capture semantic information
Re-rank candidate results to improve relevance
Output format
High-dimensional or low-dimensional vector (such as 768-dimensional vector)
Ranking score of the candidate list (e.g. relevance score)
Focus
Global semantic representation of text
Fine-grained matching of candidate results to queries

Example

  • Embedding model : Convert "How to train a neural network?" into a vector for retrieving similar questions.

  • Rerank model : Sort the 100 answers retrieved initially and put the most relevant answers in the top 3.


2. Application phase

DimensionsEmbedding ModelRerank Model
ProcessRetrieval stage
: Quickly generate candidate sets
Fine row stage
: Optimize the order of candidate sets
Data size
Processing massive amounts of data (such as millions of documents)
Processing small candidate sets (such as Top 100~1000)
Performance requirements
Requires high efficiency (millisecond-level response)
Higher latency is acceptable (complex calculations required)

Typical scenarios

  • Embedding model : used for preliminary recall of search engines (such as filtering out the top 1000 from 1 billion documents).

  • Rerank model : refines the sorting of the top 100 results in the recommendation system to improve click-through rate.


3. Technical Implementation

DimensionsEmbedding ModelRerank Model
Model Type
Unsupervised/self-supervised learning (such as BERT, Sentence-BERT)
Supervised learning (such as Pairwise Ranking, ListNet)
Input and Output
Single text input → fixed-dimensional vector
Query + Candidate Text Pair → Relevance Score
Feature Dependencies
Rely only on the semantic information of the text itself
Can integrate multiple features (semantics, click-through rate, timeliness, etc.)


Model Example

  • Embedding model :

    • Universal Semantic Encoding: BERT, RoBERTa

    • Dedicated scenario: DPR (Dense Passage Retrieval)

  • Rerank model :

    • Traditional method: BM25 + feature engineering

    • Deep Models: ColBERT, Cross-Encoder