The two core tools of the RAG retrieval system - Embedding model and Rerank model

Written by

Iris Vance

Updated on:June-24th-2025

“ The Embedding and Rerank models are the core models in the RAG system. ”

In the RAG system, there are two very important models, one is the Embedding model and the other is the Rerank model; these two models play an important role in RAG.

The role of the Embedding model is to vectorize the data and, through dimensionality reduction, make it possible to calculate the similarity between vectors through Euclidean distance, cosine function, etc., so as to perform similarity retrieval.

The role of Rerank is to perform more accurate data screening based on Embedding retrieval; if the Embedding model performs one-dimensional screening, then the Rerank model performs screening from multiple dimensions.

Embedding model and Rerank model

In natural language processing and information retrieval systems, Embedding models and Rerank models are two types of technologies with different functions but are often used in combination.

Both Embedding and Rerank models are neural network models implemented based on deep learning, but due to their different functions, their implementation and training methods are also somewhat different.

From the perspective of usage, Embedding is generally used for data vectorization and fast retrieval, while the Rerank model reorders data based on fast retrieval to improve similarity.

However, from the perspective of technical implementation, the two models use different learning methods and architectures; the reason lies in the implementation purposes and data processing methods of the two models.

The core differences between them lie in their goals, application stages and technical implementations . Here is a detailed comparison:

1. Functional goals

Dimensions	Embedding Model	Rerank Model
Core Mission	Convert text into low-dimensional vectors to capture semantic information	Re-rank candidate results to improve relevance
Output format	High-dimensional or low-dimensional vector (such as 768-dimensional vector)	Ranking score of the candidate list (e.g. relevance score)
Focus	Global semantic representation of text	Fine-grained matching of candidate results to queries

Example

Embedding model : Convert "How to train a neural network?" into a vector for retrieving similar questions.
Rerank model : Sort the 100 answers retrieved initially and put the most relevant answers in the top 3.

2. Application phase

Dimensions	Embedding Model	Rerank Model
Process	Retrieval stage : Quickly generate candidate sets	Fine row stage : Optimize the order of candidate sets
Data size	Processing massive amounts of data (such as millions of documents)	Processing small candidate sets (such as Top 100~1000)
Performance requirements	Requires high efficiency (millisecond-level response)	Higher latency is acceptable (complex calculations required)

Typical scenarios

Embedding model : used for preliminary recall of search engines (such as filtering out the top 1000 from 1 billion documents).
Rerank model : refines the sorting of the top 100 results in the recommendation system to improve click-through rate.

3. Technical Implementation

Dimensions	Embedding Model	Rerank Model
Model Type	Unsupervised/self-supervised learning (such as BERT, Sentence-BERT)	Supervised learning (such as Pairwise Ranking, ListNet)
Input and Output	Single text input → fixed-dimensional vector	Query + Candidate Text Pair → Relevance Score
Feature Dependencies	Rely only on the semantic information of the text itself	Can integrate multiple features (semantics, click-through rate, timeliness, etc.)

Model Example

Embedding model :

Universal Semantic Encoding: BERT, RoBERTa
Dedicated scenario: DPR (Dense Passage Retrieval)

Rerank model :

Traditional method: BM25 + feature engineering
Deep Models: ColBERT, Cross-Encoder