Having an Embedding model is not enough, do we also need a Rerank model?

Written by
Audrey Miles
Updated on:July-08th-2025
Recommendation

Another powerful tool in the field of information retrieval, the Rerank model helps improve the semantic matching and accuracy of search results.

Core content:
1. Definition of the Rerank model and its role in information retrieval
2. Working principle and core function of the Rerank model
3. The main differences between the Rerank model and the Embedding model and the comparison of their application scenarios

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

What is the Rerank Model?

  The Rerank model is a machine learning model used to optimize the ranking of information retrieval results. It improves the accuracy and semantic matching of the final results by fine-tuning the relevance of documents to queries. The following are its key points:

  1. ‌Definition and Positioning‌

  • It is a re-ranking algorithm that performs secondary screening and sorting of candidate documents after the initial retrieval (such as keyword matching or vector similarity retrieval).

  • In the RAG (retrieval enhanced generation) process, it is used in conjunction with the Embedding model to form a synergistic mechanism of "rough screening + fine sorting".

  • ‌Core Role‌

    • Solve the limitations of preliminary retrieval : Make up for the shortcomings of traditional retrieval methods (such as inverted index or Embedding similarity calculation) in the depth of semantic understanding.

    • Improve result quality : Re-score documents through multi-dimensional evaluation (such as semantic consistency and contextual relevance) to ensure that highly relevant content is displayed first.

  • How it works

    • ‌Supervised learning training‌ : Based on a large number of correct and incorrect query-document pairs, the model learns to maximize the score of correct pairs and minimize the score of incorrect pairs‌.

    • ‌Relevance scoring‌ : Input query and document, directly output the matching score between the two, and sort them accordingly‌.

  • ‌Typical application scenarios‌

    • ‌RAG system‌ : Optimizes the ranking of search documents and improves the accuracy of answers generated by large models‌.

    • ‌Search engine/recommendation system‌ : Fine-tune the order of results to enhance user satisfaction‌.

    What is the difference between the Rerank model and the Embedding model?

    The following is a comparison table between the Rerank model and the Embedding model, covering the core differences and typical applications:

    ‌Comparison Dimensions‌Embedding ModelRerank Model
    ‌Main Objectives‌Map text into vectors to achieve large-scale fast semantic retrievalRefine the sorting of preliminary search results to improve the ranking accuracy of relevant documents
    ‌Input and output formats‌- Input: a single piece of text (query or document) - Output: a dense vector of fixed length (e.g. 768 dimensions)- Input: query + document pair - Output: relevance score (no fixed range, such as 0.85)
    ‌Typical Architecture‌Bi-Encoder (such as BERT’s two independent encoding towers)Cross-Encoder (e.g. BERT jointly encodes query and document)
    Calculation methodIndependently encode texts, sorted by vector similarity (e.g. cosine distance)Jointly encode query and document, capture fine-grained semantic interactions and directly score
    ‌Application Phase‌Retrieval process front end: quickly recall candidate sets (such as Top-100) from massive dataRetrieval process backend: perform secondary refinement on a small number of candidate sets (such as Top-100) and output the final results (such as Top-5)
    Resource Consumption- Document vectors can be pre-computed offline - High online retrieval efficiency (only query vectors need to be calculated)- The query interaction with each document needs to be calculated online in real time - The computational cost increases linearly with the number of candidates
    ‌Effect Optimization Direction‌Improve the quality of the semantic space (e.g., uniformity, generalization), but may lose fine-grained semanticsDirectly optimize the ability to discriminate relevance and accurately match intent through supervised learning
    ‌Typical Models/ Tools‌Open Source:BGE-base-zh,text2vec Business: OpenAI Embedding, Cohere EmbedOpen Source:BGE-reranker-large,bge-reranker-base Commercial: Cohere Rerank API
    ‌Applicable scenarios‌Scenarios that require rapid screening of candidates (such as search engine first-round recall, recommendation system cold start)Scenarios that require high-precision sorting (such as RAG enhancement generation, advertisement sorting, and question-answering system answer optimization)
    ‌Pros and Cons Comparison‌✅ Advantages: efficient and scalable ❌ Disadvantages: coarse semantic matching granularity✅ Advantages: high accuracy, deep semantic understanding ❌ Disadvantages: slow calculation, poor scalability

    Typical collaboration scenario examples (taking the RAG system as an example):

    1. The Embedding model encodes user queries and document libraries into vectors to complete the initial recall.

    2. The Rerank model re-ranks the recall results to improve the accuracy of the answers generated by LLM‌

    3. The two form a complementary mechanism of "coarse screening + fine sorting", taking into account both efficiency and precision.

    RAG evaluation based on LlamaIndex :


    How to choose the Rerank model?

    First, you can refer to  https://huggingface.co/spaces/mteb/leaderboard_legacy

    The no-brainer choice is to recommend the Zhipu series

    Multilingual scenarios are preferred 

    BAAI/bge-reranker-v2-m3

    BAAI/bge-reranker-v2-gemma

    ModelBase modelLanguagelayerwisefeature
    BAAI/bge-reranker-basexlm-roberta-baseChinese and English-Lightweight reranker model, easy to deploy, with fast inference.
    BAAI/bge-reranker-largexlm-roberta-largeChinese and English-Lightweight reranker model, easy to deploy, with fast inference.
    BAAI/bge-reranker-v2-m3bge-m3Multilingual-Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference.
    BAAI/bge-reranker-v2-gemmagemma-2bMultilingual-Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.
    BAAI/bge-reranker-v2-minicpm-layerwiseMiniCPM-2B-dpo-bf16Multilingual8-40Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.

    Last words

    Considering the core irreplaceability of the Rerank model

    Capability DimensionRerank model valueLarge model replacement feasibility analysis
    ‌Semantic Interaction Depth‌Cross-coding enables fine-grained semantic matching between queries and documents (e.g., ambiguity resolution)‌LLM cannot directly replace the semantic discrimination ability at this level
    ‌Computational efficiency‌The second refinement of the Top-100 candidate set only requires millisecond latency‌LLM requires several times more computing resources to process the same amount of data.
    ‌Advantages of systemIndependent modules facilitate iterative optimization (such as domain adaptation and fine-tuning)‌The complexity of debugging end-to-end solutions increases exponentially‌

    Some recommendations for precise answers

    Scene TypeRecommended SolutionTheoretical benefits
    High-precision question answering systemRerank+full parameter LLMAnswer accuracy increased by 18-25%‌
    Real-time conversation scenarioRerank + layer pruning LLMResponse delay reduced by 40%, accuracy loss <3%‌
    Multimodal RetrievalMultimodal Rerank + Cross-modal LLMCross-modal alignment efficiency increased by 30%‌