Maximum Marginal Relevance (MMR): Improving the Diversity and Practicality of RAG Search Results

Written by
Audrey Miles
Updated on:June-27th-2025
Recommendation

Explore new strategies to improve the diversity and practicality of search results in the RAG system.

Core content:
1. The role and advantages of the MMR algorithm in the RAG system
2. MMR principle and formula analysis, as well as algorithm flow
3. Application examples of MMR in information retrieval and question-answering systems

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)


-text-

In the RAG system, searching based on vector similarity alone can easily lead to duplicate or one-sided information. In order to improve the diversity and coverage of answers, the introduction of the MMR (Maximum Marginal Relevance) algorithm can effectively strike a balance between "relevance" and "diversity", thereby selecting information that is both relevant and non-duplicate, and improving the quality and practicality of the system output.

  • 1. What is MMR and what problems does it solve?
  • 2. Basic idea of ​​MMR
  • 3. Principle and formula analysis of MMR
    • 3.1 Algorithm Flow
    • 3.2 Example: MMR in the summary task
  • 4. Application scenarios of MMR
    • 4.1. Information retrieval (e.g. search engine results ranking)
    • 4.2. Question Answering (selecting the most informative answer from multiple candidate answers)
  • 5. Code Testing
    • 5.1 Standard Similarity Retrieval (Top 5)
    • 5.2 MMR retrieval (lambda=0.3, k=5, fetch_k=10)
    • 5.3 MMR retrieval (lambda=0.7, k=5, fetch_k=10)

-- To receive the learning materials package, see the end of the article

In the RAG (Retrieval-Augmented Generation) system, the retrieval stage determines the "information source" of the final generated content.

However, if we only rely on vector similarity to perform top-k search, there may be problems such as content duplication and information concentration on a single angle, which will lead to a lack of diversity and coverage of the answers generated by the model. At this time, "relevance" alone is not enough, we hope that the search results are "both relevant and diverse".

To solve this problem, the introduction of Maximum Marginal Relevance (MMR) can effectively strike a balance between relevance and diversity, optimize the quality of retrieval results, and improve the richness and practicality of the final answer.

Related reading: Local knowledge base, using RAG to solve the problem of accurate information generation

1. What is MMR and what problems does it solve?

Maximum Marginal Relevance (MMR)  is a ranking algorithm that is mainly used to select content that is both relevant and non-duplicate in tasks such as information retrieval, recommendation systems, and summary generation.

In the context of RAG, traditional ranking methods often only focus on the relevance of the content (i.e., the degree of match with the query). However, if we simply rank based on relevance, the following problems may occur:

  • Duplicate content : For example, in a recommendation system, the recommended content is too similar and users cannot feel fresh.
  • Incomplete information : Some systems may focus too much on a certain field or topic, resulting in users only seeing one-sided information.

The core idea of ​​MMR is to increase diversity while ensuring relevance, thereby providing more comprehensive and richer results.

Simply put, among a bunch of candidate content, give priority to those items that are relevant to the user's query and do not overlap with the content that has already been selected.

We can understand its "purpose" this way - MMR = "give you what you want + avoid what you have already seen".

Suppose you are using a news recommendation system and enter the keyword "artificial intelligence". A traditional recommendation system may recommend multiple articles on "application of artificial intelligence in the medical industry", and the content of these articles is highly similar.

After using MMR, the system may recommend:

  • An article about artificial intelligence medical applications.
  • An article about the field of artificial intelligence education.
  • An article on the ethical issues of artificial intelligence.

In this way, users can obtain information related to the topic while understanding different perspectives in the field, avoiding duplication.

2. Basic idea of ​​MMR

There are two key words involved here: relevance  and  diversity . These two may sound like they are "fighting", but in fact, they are indispensable partners in information sorting.

Relevance is fundamental, but not enough

Relevance is easy to understand. It refers to the degree of match between a piece of content and a user's query, interests, and goals. For example, if you search for "machine learning", you certainly don't want the system to push "baking tutorials" to you. This is where relevance comes into play.

But if the system only pursues relevance, there will be a problem: the content is concentrated on one point and soon becomes "repetitive" . You will start to think: "Aren't these all the same?"

Diversity makes information richer

Diversity refers to the degree of difference between results. If each recommended content is approached from a different angle, such as one on principles, one on applications, and one on future trends, then you will feel more informed and more rewarding after reading it.

Relevance ensures that you "see the right things", and diversity ensures that you "see different things".

It is useless to have all relevant but repetitive content; it is also useless to have all diverse but irrelevant content.

Therefore, the goal of MMR is clear:

From a pile of candidate content, pick out those items that are both "highly relevant to the query" and "not duplicated with the content already selected".

At each step, it weighs the following factors when choosing the next content:

  • How well does the content itself match user needs?
  • Is it too similar to the ones we’ve already selected?

What MMR does is to find a balance between the two. In other words, it wants to select a good "novel" content every time, rather than simply pushing out the "most relevant ones" at once.

You can think of MMR as a curator who "understands both information and user psychology":

It says, "You may not have seen this yet, but it's very relevant to what you want, and it's different than the previous ones, so it's worth a look."

Therefore, the core goal of MMR is to select content that is both relevant and non-duplicate .

3. Principle and formula analysis of MMR

The standard formula for MMR is as follows (slide left and right):

in:

  • : Current candidate document

  • :User query question

  • : Selected document collection

  • : The relevance score of the candidate document to the query question

  • : Candidate documents  With a selected document  The similarity of  With all selected documents  The one that is most similar to

  • : Balance parameter (0 ≤ λ ≤ 1)

    • λ = 1 : Focus entirely on relevance and ignore diversity.
    • λ = 0 : Focus entirely on diversity and ignore relevance.
    • 0 < λ < 1 : Balance between relevance and diversity.

Related reading: From a beginner to an expert in artificial intelligence: A simple understanding of cosine similarity

3.1 Algorithm Flow

  1. initialization :
  • Counting document collectionsAll documents and queries inofScore
  • chooseTop scoring documents
  • WillAdd to selected collection
  • Iteration Selection :
    • For the remaining document collectionEach document in,calculate:
    • Select the document with the highest MMR score among the remaining documents and add it to the collection

    3.2 Example: MMR in the summary task

    Suppose we have a long article and want to select three sentences from it to form a short summary. We have five candidate sentences, numbered: S1, S2, S3, S4, S5.

    Suppose we set the equilibrium parameter .

    1. Initialization :

    • Calculate the correlation between all candidate sentences and user questionsScore
    • Select the sentence S1 with the highest score (assuming S1 is the relevance score with query Q () The highest sentence)
    • Add S1 to the selected set

    Collection status update :

    Selected collection: {S1}

    Candidate set: {S2, S3, S4, S5}

    2. Iteration selection :

    For each sentence in the remaining sentence set, calculation (slide left and right):

    The first iteration candidate set:

    •  = 0.9 |  = 0.8

    •  = 0.75 |  = 0.2

    •  = 0.85, = 0.3

    •  = 0.65, = 0.2

    Select the sentence S4 with the highest MMR score and add it to the selected set

    While both S2 and S3 are "relevant", S4 has the highest MMR score, so S4 is chosen.

    Collection status update :

    Selected set: {S1, S4}

    Candidate set: {S2, S3, S5}

    Second iteration candidate set:

    •  = 0.9 |  = 0.8, = 0.6

    •  = 0.75 |  = 0.2, = 0.4

    •  = 0.65 |  = 0.2, = 0.3

    Select the sentence S3 with the highest MMR score and add it to the selected set

    Among the remaining sentences, S3 has the highest MMR score (0.405), so S3 is selected.

    Collection status update :

    Selected set: {S1, S4, S3}

    Candidate set: {S2, S5}

    The final three sentences selected are: S1, S4, S3

    4. Application scenarios of MMR

    4.1. Information retrieval (e.g. search engine results ranking)

    You enter a keyword on the search engine, such as "ChatGPT application case", and the backend system will immediately find hundreds or thousands of related web pages.

    If we only look at "relevance", the first few items may all be about "how to use ChatGPT in educational scenarios". Although they are all correct, you may feel that they are too concentrated and repetitive.

    After using MMR, the system will make the display results more "layered" based on relevance:

    • The first one is about education.
    • The second one might be about the legal industry.
    • The third one is how developers integrate ChatGPT.
    • The fourth point may be about the ethical issues it brings.

    4.2. Question Answering (selecting the most informative answer from multiple candidate answers)

    For example, if you ask a relatively open question, such as "What changes will artificial intelligence bring in the future?"

    The system may find 10 possible answers from a database or model.

    This is where MMR can help: instead of simply putting the "most repeated" answers at the top, it picks out complementary information, such as:

    • One is the impact on employment.
    • One is about the impact on education.
    • One said the potential of technological development...

    Other uses include: recommendation system (avoid pushing similar content), text summarization (avoid repeated sentences)

    5. Code Testing

# Import the operating system module
import  os

# Set OpenAI API key
# Note: In actual applications, do not hard-code API keys. It is recommended to use environment variables or other secure methods to manage keys.
OPENAI_API_KEY =  'hk-iwtbie4a91e427' # Example key, please replace it with your own valid key

# Set the API key as an environment variable
os.environ[ 'OpenAI_API_KEY' ] = OPENAI_API_KEY

from  langchain_chroma  import  Chroma
from  langchain_openai  import  OpenAIEmbeddings

# Define a list containing sample texts that will be embedded and stored in the vector database
texts = [
    "Large Language Model (LLM) is a deep learning model based on Transformer architecture."# Definition of LLM
    "The core of LLM is the Transformer architecture, which is a powerful deep learning technology."# Similar to the previous sentence
    "Transformer-based LLM performs well in natural language processing tasks."# Similar to the first sentence
    "LLM learns language patterns by pre-training on massive text data."# LLM training method
    "Pre-training enables LLM to master rich language knowledge and world common sense."# Similar to the previous sentence
    "LLM demonstrates strong natural language understanding and generation capabilities."# LLM capabilities
    "Understanding and generating natural language is one of the core functions of LLM."# Similar to the previous sentence
    "LLMs like GPT-4 can perform tasks as diverse as translation, summarization, and question answering."# Applications of LLMs
    "LLM is widely used in text translation, content summarization and intelligent question answering."# Similar to the previous sentence
    "Artificial Intelligence (AI) is a broader field, of which LLM is a subset."# Relationship between LLM and AI
    "The goal of AI is to create machines that can think and act like humans." # The goal of AI
]

# Initialize the OpenAI embedding model, specifying the model name and API base URL
embeddings = OpenAIEmbeddings(model= "text-embedding-3-large" , base_url= "https://api.openai-hk.com/v1" )
# Use the from_texts method of the Chroma class to create vector storage
vectorstore = Chroma.from_texts(
    texts=texts,  # List of texts to be embedded and stored
    embedding=embeddings,  # Embedding model instance used to generate embeddings
    persist_directory = "./chroma_db" # Specify the directory for persistent storage of vector data
)

# Define a query string for searching in the vector database
query =  'What are large language models and what can they do? '

print( "========================== Similarity retrieval============================" )
# Use the vector storage similarity_search method to perform similarity retrieval
t1 = vectorstore.similarity_search(query, k= 5# k=5 means returning the 5 most similar results
# Print the results of similarity search
print(t1)

# Mark the start of the MMR search (lambda=0.3) part
print( "========================= MMR lambda=0.3 ============================" )
# Perform MMR search using max_marginal_relevance_search method of vector storage
t2 = vectorstore.max_marginal_relevance_search(query, k= 5 , fetch_k= 10 , lambda_mult= 0.3 )
# k=5 means that 5 results are finally returned, fetch_k=10 means that 10 similar results are initially obtained for MMR calculation, and lambda_mult=0.3 controls the trade-off between diversity and similarity
# Print the results of MMR search (lambda=0.3)
print(t2)

# Mark the start of the MMR search (lambda=0.7) part
print( "========================= MMR lambda=0.7 ============================" )
# Perform MMR search using max_marginal_relevance_search method of vector storage
t3 = vectorstore.max_marginal_relevance_search(query, k= 5 , fetch_k= 10 , lambda_mult= 0.7 )
# k=5 means that 5 results are finally returned, fetch_k=10 means that 10 similar results are initially obtained for MMR calculation, and lambda_mult=0.7 controls the trade-off between diversity and similarity
# Print the results of MMR search (lambda=0.7)
print(t3)

Operation results:

========================== Similarity retrieval=============================
[Document(id= '76a37d7d-4f9e-43ca-8ca1-396fd5a956bc' , metadata={}, page_content= 'The Large Language Model (LLM) is a deep learning model based on the Transformer architecture.' ), 
Document(id= '9f76337c-3f6c-4c14-81e5-399338e30938' , metadata={}, page_content= 'LLM learns language patterns by pre-training on massive text data.' ), 
Document(id= '23717671-2353-4daa-a30f-80ce191cfb90' , metadata={}, page_content= 'Understanding and generating natural language is one of the core functions of LLM.' ), 
Document(id= 'f5a64fe0-b616-4a02-b932-ea1d6f7a1217' , metadata={}, page_content= 'LLM demonstrates strong natural language understanding and generation capabilities.' ), 
Document(id= 'c6237dc4-4087-4eee-b838-a2392a3ef993' , metadata={}, page_content= 'Transformer-based LLM performs well in natural language processing tasks.' )]
========================= MMR lambda=0.3 =============================
[Document(id= '76a37d7d-4f9e-43ca-8ca1-396fd5a956bc' , metadata={}, page_content= 'The Large Language Model (LLM) is a deep learning model based on the Transformer architecture.' ), 
Document(id= 'f5a64fe0-b616-4a02-b932-ea1d6f7a1217' , metadata={}, page_content= 'LLM demonstrates strong natural language understanding and generation capabilities.' ), 
Document(id= '4a3a8219-8065-4d74-b7ce-187f16e87ecf' , metadata={}, page_content= 'LLMs like GPT-4 can perform multiple tasks such as translation, summarization, and question answering.' ), 
Document(id= 'fc2b9c7a-c63c-4c7b-a153-eece2d6bb02e' , metadata={}, page_content= 'Pre-training enables LLM to master rich language knowledge and world common sense.' ), 
Document(id= 'd0cbe326-c4c0-4252-9636-eef7bed06379' , metadata={}, page_content= 'Artificial Intelligence (AI) is a broader field, of which LLM is a subset.' )]
========================= MMR lambda=0.7 =============================
[Document(id= '76a37d7d-4f9e-43ca-8ca1-396fd5a956bc' , metadata={}, page_content= 'The Large Language Model (LLM) is a deep learning model based on the Transformer architecture.' ), 
Document(id= '9f76337c-3f6c-4c14-81e5-399338e30938' , metadata={}, page_content= 'LLM learns language patterns by pre-training on massive text data.' ), 
Document(id= '23717671-2353-4daa-a30f-80ce191cfb90' , metadata={}, page_content= 'Understanding and generating natural language is one of the core functions of LLM.' ), 
Document(id= 'f5a64fe0-b616-4a02-b932-ea1d6f7a1217' , metadata={}, page_content= 'LLM demonstrates strong natural language understanding and generation capabilities.' ), 
Document(id= '4a3a8219-8065-4d74-b7ce-187f16e87ecf' , metadata={}, page_content= 'LLMs like GPT-4 can perform multiple tasks such as translation, summarization, and question answering.' )]

5.1 Standard Similarity Retrieval (Top 5)

This strategy aims to find the documents that are most similar to the query.

Results Features:

  • Highly relevant: The retrieved documents are directly relevant to the definition, architecture, and capabilities of the Big Language Model.
  • Potential redundancy: Some documents have similar content, for example, they all mention the Transformer architecture or natural language processing capabilities.

5.2 MMR retrieval (lambda=0.3, k=5, fetch_k=10)

Lower lambda A value of (0.3) places more emphasis on  diversity .

Results Features:

  • Balance: The most relevant documents are retained, while more information on different aspects is introduced, such as specific applications, training methods, and relationship with AI.
  • Lower redundancy: Compared with pure similarity search, the results are less repetitive.

5.3 MMR retrieval (lambda=0.7, k=5, fetch_k=10)

Higher lambda A value of (0.7) places more emphasis on  relevance .

Results Features:

  • High relevance: The results are very close to standard similarity retrieval, retaining most of the most similar documents.
  • Limited diversity: compared to lambda=0.3,The diversity is lower, but still slightly higher than pure similarity retrieval, and documents on specific tasks of LLM are introduced.

The core value of MMR is to  improve the diversity of results and ensure that the returned content is both relevant and diverse enough. In multiple scenarios such as recommendation systems, summary generation, and question-answering systems, MMR can effectively avoid duplication and improve user experience.