Let’s talk about RAG practice under the AI ​​agent framework MetaGPT

Written by
Jasper Cole
Updated on:June-25th-2025
Recommendation

Explore the MetaGPT framework developed by the domestic team to achieve efficient practice of RAG.

Core content:
1. Overview of the MetaGPT framework and its comparison with international intelligent agent frameworks
2. The convenience of RAG practice and MetaGPT integration llama_index
3. Detailed guide to vector model deployment and local LLM&RAG configuration

 
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Overview

MetaGPT [1] is an intelligent agent development framework that is fully functional and easy to develop. It was developed by a domestic team and can directly compete with a number of intelligent agent frameworks such as Microsoft's Autogen.

MetaGPT integrates llama_index and implements RAG. Combined with MetaGPT, you can quickly and easily access custom LLM, which makes the user experience very good. It is more convenient than customizing LLM in llama_index and using RAG.

RAG Use

Take the official sample project as an example and run a rag_pipeline program.

Install

I personally recommend installing from source:pip install -e .[rag]; It can easily and quickly modify source code for debugging; refer to: RAG module [2] .

Vector Model Deployment

Quickly install Ollama based on modelscope. The installation download from the ollama official website is very slow, but you can download and install it based on ModelScope, address: Ollama-Linux [3] . After downloading, pull the bge-m3 vector model and specify the environment variables to run:

# Pull
OLLAMA_HOST=0.0.0.0:6006 ollama pull bge-me:567m

# start up
CUDA_VISIBLE_DEVICES=2,3 OLLAMA_HOST=0.0.0.0:6006 ./ollama serve

In this way, the vector model is deployed.

MetaGPT local LLM&RAG configuration

I have not found any official or online information on how to configure local LLM&RAG. They are basically the default openai types. I researched it myself and found that the following configuration is sufficient:

# Full Example: https://github.com/geekan/MetaGPT/blob/main/config/config2.example.yaml
# Reflected Code: https://github.com/geekan/MetaGPT/blob/main/metagpt/config2.py
# Config Docs: https://docs.deepwisdom.ai/main/en/guide/get_started/configuration.html
llm:
  api_type: "open_llm" # or azure / ollama / groq etc.   
  model: "glm4" # or gpt-3.5-turbo   
  base_url: "http://127.0.0.1:7860/v1" # or forward url / other llm url   
  # max_token: 6000
  #api_key: "empty"

# RAG Embedding.
# For backward compatibility, if the embedding is not set and the llm's api_type is either openai or azure, the llm's config will be used. 
embedding:
  api_type: "ollama" # openai / azure / gemini / ollama etc. Check EmbeddingType for more options.  
  base_url: "http://127.0.0.1:6006" 
  #api_key: ""
  model: "bge-m3:567m" 
  dimensions: "1024" # output dimension of embedding model  

RAG Example Project

Next, run the official RAG example project: rag_pipeline.py, and comment out the es code:

async def main () :  
    """RAG pipeline.

    Note:
    1. If `use_llm_ranker` is True, then it will use LLM Reranker to get better result, but it is not always guaranteed that the output will be parseable for reranking,
       prefer `gpt-4-turbo`, otherwise might encounter `IndexError: list index out of range` or `ValueError: invalid literal for int() with base 10`.
    """


    # Solve the problem of ValueError: Calculated available context size -12792 was not non-negative.
    Settings._prompt_helper = PromptHelper(context_window= 6000 )
    e = RAGExample(use_llm_ranker= False )

    await  e.run_pipeline()
    await  e.add_docs()
    await  e.add_objects()
    await  e.init_objects()
    await  e.init_and_query_chromadb()
    # Temporarily comment ES
    # await e.init_and_query_es()

The official documentation also mentions that there may beValueError: Calculated available context size -12792 was not non-negativeI also encountered the error, which is essentially thrown by the integrated llama_index. There are two ways, one of which is officially recommended.max_token, another one I used:Settings._prompt_helper = PromptHelper(context_window=6000).

The final run log is as follows:

summary

In fact, the whole process is not difficult. The main problem is that there is no reference for the configuration items. I don’t know whether it is really not configured based on local LLM&RAG, or whether it uses offline calls. Anyway, I prefer the remote API method.

The RAG module function of MetaGPT is implemented based on llama_index, which is actually directly integrated. Therefore, if there is a need to optimize RAG, the source code can be modified directly, which is why I recommend installing based on the source code. However, after looking at the source code of llama_index, I think it is still well written, especially now that the RAG process is arranged based on events.