LlamaIndex combines Ragflow to create high-performance large-model RAG applications

Written by

Jasper Cole

Updated on:June-26th-2025

LlamaIndex and Ragflow join hands to create a powerful combination for large language model applications.
LlamaIndex and Ragflow are two open source tools that bring great convenience to developers. As a data framework, LlamaIndex can easily connect large language models with various external data sources, whether it is structured data (such as SQL, NoSQL databases), unstructured data (such as documents, web pages), or private data (obtained through APIs). Ragflow, as a workflow orchestration tool, focuses on managing complex large language model pipeline execution processes to ensure that the entire processing process proceeds in an orderly manner.

The two complement each other and together provide a comprehensive solution for building powerful and highly scalable large language model applications, helping developers to innovate and practice more efficiently in this field.

1 Definition

1.1 LlamaIndex

LlamaIndex allows developers to connect large language models with a variety of external data sources, including structured data (SQL databases, non-relational databases), unstructured data (documents, web pages), and private data (APIs). With it, developers can build large language model applications that can obtain information and reason widely.

LlamaIndex has many features:

Convenient data connectors : It comes with a library of pre-built data connectors that adapt to common data sources. When connecting to new data sources, developers do not need to write custom code.
Efficient data indexing : External data can be indexed to quickly search and retrieve information in large data sets.
Smart Q&A function : It can answer questions based on external data sources, making it easier for developers to create Q&A applications for specific topics or documents.

1.2 Ragflow

As a workflow orchestration tool, Ragflow can effectively manage the execution process of complex large language model pipelines. With this feature, it provides strong support for building large language model applications with multi-task execution capabilities. These tasks include:

Data Retrieval : Ragflow can retrieve data from external data sources.
Data processing : Ragflow can process data, such as cleaning, transforming, and aggregating data.
Large language model reasoning : Ragflow can perform large language model reasoning tasks.
Output Generation : Ragflow can generate output in a variety of formats, such as text, tables, or charts.

1.3 LlamaIndex and Ragflow Collaboration

LlamaIndex and Ragflow can be used together to build powerful large language model applications.

LlamaIndex is responsible for data interaction, connecting large language models with various data sources, and can also index and query data to broaden the channels for obtaining model information. Ragflow focuses on workflow orchestration and manages complex large language model pipeline execution.

The collaboration between the two makes it possible to develop multifunctional large language model applications. These applications can achieve tasks such as question answering, text generation, and data analysis, meet the needs of different scenarios, and help large language models be widely used.

2 Code Implementation

Next, we implement the code of LlamaIndex and Ragflow step by step:

Step 1: Install the library, initialize the API key and download the data

pip install -U llama-index

# Initialize API key
import  os
os.environ[ "OPENAI_API_KEY" ] =  "sk-proj-..."

# Download data
!mkdir -p data
!wget --user-Agent  "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf"  -O  "data/llama2.pdf"

Step 2: Workflow Events

from  llama_index.core.workflow  import  Event
from  llama_index.core.schema  import  NodeWithScore


class RetrieverEvent (Event) : 
    """Run the search results"""
    nodes: list[NodeWithScore]


class RerankEvent (Event) : 
    """The result of reordering the retrieved nodes"""
    nodes: list[NodeWithScore]

Step 3: Complete Workflow

from  llama_index.core  import  SimpleDirectoryReader, VectorStoreIndex
from  llama_index.core.response_synthesizers  import  CompactAndRefine
from  llama_index.core.postprocessor.llm_rerank  import  LLMRerank
from  llama_index.core.workflow  import  (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)
from  llama_index.llms.openai  import  OpenAI
from  llama_index.embeddings.openai  import  OpenAIEmbedding


class RAGWorkflow (Workflow) : 
    @step(pass_context=True)
    async def ingest (self, ctx: Context, ev: StartEvent)  -> StopEvent |  None : 
        """Entry point for ingesting documents, triggered by a StartEvent containing `dirname`."""
        dirname = ev.get( "dirname" )
        if not  dirname:
            return None

        documents = SimpleDirectoryReader(dirname).load_data()
        ctx.data[ "index" ] = VectorStoreIndex.from_documents(
            documents=documents,
            embed_model=OpenAIEmbedding(model_name= "text-embedding-3-small" ),
        )
        return  StopEvent(result= f"Indexed  {len(documents)}  documents." )

    @step(pass_context=True)
    async def retrieve ( 
        self, ctx: Context, ev: StartEvent
    )  -> RetrieverEvent |  None :
        """The entry point of RAG, triggered by a StartEvent containing a `query`."""
        query = ev.get( "query" )
        if not  query:
            return None

        print( f"Query the database with:  {query} " )

        # Store the query in the global context
        ctx.data[ "query" ] = query

        # Get the index from the global context
        index = ctx.data.get( "index" )
        if  index  is None :
            print( "Index is empty, load some documents before querying!" )
            return None

        retriever = index.as_retriever(similarity_top_k= 2 )
        nodes = retriever.retrieve(query)
        print( f"Retrieved  {len(nodes)}  nodes." )
        return  RetrieverEvent(nodes=nodes)

    @step(pass_context=True)
    async def rerank (self, ctx: Context, ev: RetrieverEvent)  -> RerankEvent: 
        # Reorder the nodes
        ranker = LLMRerank(
            choice_batch_size= 5 , top_n= 3 , llm=OpenAI(model= "gpt-4o-mini" )
        )
        print(ctx.data.get( "query" ), flush= True )
        new_nodes = ranker.postprocess_nodes(
            ev.nodes, query_str=ctx.data.get( "query" )
        )
        print( f"Reranked nodes to  {len(new_nodes)} " )
        return  RerankEvent(nodes=new_nodes)

    @step(pass_context=True)
    async def synthesize (self, ctx: Context, ev: RerankEvent)  -> StopEvent: 
        """Return a streaming response with the reordered nodes."""
        llm = OpenAI(model= "gpt-4o-mini" )
        summarizer = CompactAndRefine(llm=llm, streaming= True , verbose= True )
        query = ctx.data.get( "query" )

        response =  await  summarizer.asynthesize(query, nodes=ev.nodes)
        return  StopEvent(result=response)

Step 4: Run the workflow

w = RAGWorkflow()
# Ingest Documents
await  w.run(dirname = "data" )
# Run the query
result =  await  w.run(query= "How was Llama2 trained?" )
async for  chunk  in  result.async_response_gen(): 
    print(chunk, end= "" , flush= True )

Query the database with: How was Llama2 trained?
Retrieved 2 nodes.
Llama 2 was trained through a multi-step process that began with pretraining using publicly available online sources. This was followed by the creation of an initial version of Llama 2-Chat through supervised fine-tuning. The model was  then  iteratively refined using Reinforcement Learning with Human Feedback (RLHF) methodologies,  which  included techniques like rejection sampling and Proximal Policy Optimization (PPO). 

During pretraining, the model utilized an optimized auto-regressive transformer architecture, incorporating robust data cleaning, updated data mixes, and training on a significantly larger dataset of 2 trillion tokens. The training process also involved increased context length and the use of grouped-query attention (GQA) to enhance inference scalability. 

The training employed the AdamW optimizer with specific hyperparameters, a cosine learning rate schedule, and gradient clipping. The models were pretrained on Meta 's Research SuperCluster and internal production clusters, utilizing NVIDIA A100 GPUs for the training process.

3 Conclusion

LlamaIndex and Ragflow play an important role in the development of large language model (LLM) applications. These two open source tools have unique advantages and can help build applications based on large language models.

LlamaIndex can connect to data sources and process data, and Ragflow can efficiently orchestrate workflows. The two work together to provide a comprehensive solution for building powerful and scalable large language model applications.

For relevant technical personnel, exploring LlamaIndex and Ragflow and using them to build applications will help keep up with technology trends and improve development capabilities. We hope that everyone will tap their potential in practice and promote the innovative development of large language model applications.