LightRAG: A simple and fast search enhancement generation framework to get started quickly

Written by
Clara Bennett
Updated on:June-27th-2025
Recommendation

Explore new options for efficient Q&A and content generation. The LightRAG framework allows you to easily master RAG technology.

Core content:
1. LightRAG framework's lightweight design and advantages
2. Quick deployment and installation guide
3. Practical application examples and environment configuration

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

LightRAG is a lightweight retrieval-augmented generation (RAG) framework that aims to reduce the deployment cost and resource consumption of traditional RAG systems by optimizing the retrieval and generation processes while maintaining efficient question-answering and content generation capabilities. It is suitable for scenarios that require fast response and low computing power support, such as intelligent customer service, knowledge question-answering, and lightweight AI assistants.

Install

Installing the LightRAG Core Module

  • Install from source (recommended)

    cd  LightRAG  
    pip  install  -e  .  
  • Install via PyPI

    pip  install  lightrag-hku  

Install LightRAG Server

The LightRAG server provides a web interface and API support to facilitate document indexing, knowledge graph exploration, and simple RAG queries. It is also compatible with the Ollama interface and can be simulated as an Ollama chat model, making it easy for AI chatbots such as Open WebUI to access it.

  • Install via PyPI

    pip  install "lightrag-hku[api]"   
  • Installing from Source

    # Create a Python virtual environment on demand  
    # Install API support for editable mode  
    pip  install  -e  ".[api]"  

For more information about LightRAG server, please refer to  LightRAG Server .

Quick Start

  • Local running demo video
    .
  • All code examples are located in examples Table of contents.
  • When using the OpenAI model, you need to set environment variables:export OPENAI_API_KEY="sk-...".
  • Download the demo text of A Christmas Carol:
    curl  https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt  >  ./book.txt  

Query

Use the following Python code to initialize LightRAG and execute a query:

import  os  
import  asyncio  
from  lightrag  import  LightRAG ,  QueryParam  
from  lightrag . llm . openai  import  gpt_4o_mini_complete ,  gpt_4o_complete ,  openai_embed  
from  lightrag .kg .shared_storage import  initialize_pipeline_status  ​ 
from  lightrag .utils import   setup_logger  

setup_logger ( "lightrag" ,  level = "INFO" )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = "your/path" ,  
        embedding_func = openai_embed ,  
        llm_model_func = gpt_4o_mini_complete  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag  

def main ( ) :   
    # Initialize the RAG instance  
    rag  =  asyncio . run ( initialize_rag ( ) )  
    # Insert text  
    rag . insert ( "Your text" )  

    # Select the search mode: simple search, local search, global search, hybrid search, knowledge graph and vector hybrid search  
    mode  = "mix"   

    rag . query (  
        "What are the top themes in this story?" ,  
        param = QueryParam ( mode = mode )  
    )  

if  __name__  == "__main__" :   
    main ( )  

QueryParam

class QueryParam :   
    mode :  Literal [ "local" , "global" , "hybrid" , "naive" , "mix" ] = "global"        
    """Search mode:  
    - "local": focus on contextually relevant information  
    - "global": use global knowledge  
    - "hybrid": Combine local and global searches  
    - "naive": basic search (no advanced techniques)  
    - "mix": Fusion of knowledge graph and vector retrieval (supports structured KG and unstructured vectors, processes image content through HTML img tags, and controls retrieval depth through top_k)  
    """
  
    only_need_context : bool = False     
    """If True, only the retrieved context is returned without generating an answer"""  
    response_type : str = "Multiple Paragraphs"     
    """Response format (e.g., "multiple paragraphs", "single paragraph", "bullet")"""  
    top_k : int = 60     
    """The number of top retrieved (number of entities in local mode, number of relationships in global mode)"""  
    max_token_for_text_unit : int = 4000     
    """The maximum number of tokens per retrieved text block"""  
    max_token_for_global_context : int = 4000     
    """The maximum number of tokens in a relationship description in a global search"""  
    max_token_for_local_context : int = 4000     
    """The maximum number of tokens in entity descriptions in local searches"""  
    ids : list [ str ] | None = None # Only supports PG Vector database         
    """Filter the RAG ID list"""  
    model_func :  Callable [ . . . , object ] | None = None       
    """Optional: Override the LLM model function for this query (different models can be used for different modes)"""  
    . . .  

The default value of top_k can be set via the environment variable TOP_K Revise.

LLM and Embedding Model Injection

LightRAG needs to inject LLM and call methods of embedded models to complete document indexing and query tasks.

Using OpenAI style API

async def llm_model_func (    
    prompt ,  system_prompt = None ,  history_messages = [ ] ,  keyword_extraction = False , ** kwargs   
) - > str :    
    return await  openai_complete_if_cache (   
        "solar-mini" ,  
        prompt ,  
        system_prompt = system_prompt ,  
        history_messages = history_messages ,  
        api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,  
        base_url = "https://api.upstage.ai/v1/solar" ,  
        ** kwargs  
    )  

async def embedding_func ( texts : list [ str ] ) - >  np . ndarray :      
    return await  openai_embed (   
        texts ,  
        model = "solar-embedding-1-large-query" ,  
        api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,  
        base_url = "https://api.upstage.ai/v1/solar"  
    )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = WORKING_DIR ,  
        llm_model_func = llm_model_func ,  
        embedding_func = EmbeddingFunc (  
            embedding_dim = 4096 ,  
            max_token_size = 8192 ,  
            func = embedding_func  
        )  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag  

Using the Hugging Face model

refer to lightrag_hf_demo.py:

# Initialize LightRAG and use the Hugging Face model  
rag  =  LightRAG (  
    working_dir = WORKING_DIR ,  
    llm_model_func = hf_model_complete , # Hugging Face text generation model    
    llm_model_name = 'meta-llama/Llama-3.1-8B-Instruct' , # Hugging Face model name    
    # Using Hugging Face embedding function  
    embedding_func = EmbeddingFunc (  
        embedding_dim = 384 ,  
        max_token_size = 5000 ,  
        func = lambda  texts :  hf_embed (  
            texts ,  
            tokenizer = AutoTokenizer . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" ) ,  
            embed_model = AutoModel . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" )  
        )  
    ) ,  
)  

Using the Ollama model

Overview : You need to pull models first (such as text generation models and embedding models nomic-embed-text):

# Initialize LightRAG and use the Ollama model  
rag  =  LightRAG (  
    working_dir = WORKING_DIR ,  
    llm_model_func = ollama_model_complete , # Ollama text generation model    
    llm_model_name = 'your_model_name' , # model name    
    # Using Ollama to embed functions  
    embedding_func = EmbeddingFunc (  
        embedding_dim = 768 ,  
        max_token_size = 8192 ,  
        func = lambda  texts :  ollama_embed (  
            texts ,  
            embed_model = "nomic-embed-text"  
        )  
    ) ,  
)  

Increase the context length :

  • Modify Modelfile
    (Default context 8k, at least 32k required):
  1. Pull model:ollama pull qwen2
  2. Export the model file:ollama show --modelfile qwen2 > Modelfile
  3. Add parameters:PARAMETER num_ctx 32768
  4. Create the modified model:ollama create -f Modelfile qwen2m
  • Setting via Ollama API
    :
    rag  =  LightRAG (  
        . . .  
        llm_model_kwargs = { "options" : { "num_ctx" : 32768 } } ,    
        . . .  
    )  
  • Low-memory GPUs : Select a small model and adjust the context window (such as gemma2:2b with 26k context).

    Integrating LlamaIndex

    LightRAG supports integration with LlamaIndex (see llm/llama_index_impl.py):

# Use LlamaIndex to directly access OpenAI  
import  asyncio  
from  lightrag  import  LightRAG  
from  lightrag . llm . llama_index_impl  import  llama_index_complete_if_cache ,  llama_index_embed  
from  llama_index . embeddings . openai  import  OpenAIEmbedding  
from  llama_index . llms . openai  import  OpenAI  
from  lightrag .kg .shared_storage import  initialize_pipeline_status  ​ 
from  lightrag .utils import   setup_logger  

setup_logger ( "lightrag" ,  level = "INFO" )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = "your/path" ,  
        llm_model_func = llama_index_complete_if_cache , # LlamaIndex compatible generation function    
        embedding_func = EmbeddingFunc (  
            embedding_dim = 1536 ,  
            max_token_size = 8192 ,  
            func = lambda  texts :  llama_index_embed ( texts ,  embed_model = embed_model )  
        ) ,  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag  

def main ( ) :   
    rag  =  asyncio . run ( initialize_rag ( ) )  
    with open ( "./book.txt" , "r" ,  encoding = "utf-8" ) as  f :     
        rag . insert ( f . read ( ) )  
    # Execute queries in different modes...  

if  __name__  == "__main__" :   
    main ( )  

Detailed documentation and examples :

  • LlamaIndex documentation
  • Direct OpenAI Example
  • LiteLLM Agent Example

Token usage tracking

LightRAG TokenTracker The tool monitors LLM token consumption, making it easier to control API costs and optimize performance.

from  lightrag .utils import   TokenTracker  

# Method 1: Context Manager (Recommended)  
with  TokenTracker ( ) as  tracker :   
    result1  = await  llm_model_func ( "Question 1" )   
    result2  = await  llm_model_func ( "Question 2" )   
print ( "Total token consumption: " ,  tracker . get_usage ( ) )  

# Method 2: Manual Recording  
tracker  =  TokenTracker ( )  
tracker . reset ( )  
rag . insert ( )  
rag . query ( "Question 1" ,  param = QueryParam ( mode = "naive" ) )  
print ( "Token usage for insert and query: " ,  tracker . get_usage ( ) )  

Tips :

  • Long-running sessions or batch operations are automatically tracked using context managers
  • Manual call for segmented statistics reset()
  • Regularly check token usage during the development and testing phase

Conversation history support

LightRAG supports multi-turn conversations and is context-aware by passing in the conversation history:

conversation_history  = [   
    { "role" : "user" , "content" : "What is the protagonist's attitude towards Christmas?" } ,     
    { "role" : "assistant" , "content" : "At the beginning of the story, Ebenezer Scrooge has a negative attitude towards Christmas..." } ,     
    { "role" : "user" , "content" : "How did his attitude change?" }     
]  

query_param  =  QueryParam (  
    mode = "mix" , # supports all modes    
    conversation_history = conversation_history ,  
    history_turns = 3 # Consider the last 3 rounds of conversation    
)  

response  =  rag . query ( "What is the reason for this personality change?" ,  param = query_param )  

Custom prompt words

Supports custom system prompt words to finely control model behavior:

custom_prompt  = """   
You are an expert in environmental science, provide a detailed and structured response, and include examples.  
---Dialogue History---  
{history}  
---knowledge base---  
{context_data}  
---Response Rules---  
Target format and length: {response_type}  
"""
  

response_custom  =  rag . query (  
    "What are the main advantages of renewable energy?" ,  
    param = QueryParam ( mode = "hybrid" ) ,  
    system_prompt = custom_prompt  
)  

Independent keyword extraction

Added query_with_separate_keyword_extraction Function, separates keyword extraction from user prompts and focuses on query intent:

rag . query_with_separate_keyword_extraction (  
    query = "Explain the law of universal gravitation" ,  
    prompt = "Provide detailed explanations for high school students studying physics" ,  
    param = QueryParam ( mode = "hybrid" )  
)  

Data Insertion

Basic Insert

# Single text insertion  
rag.insert ( "text content " )  

Batch Insert

# Batch insert multiple texts  
rag . insert ( [ "text1" , "text2" , . . . ] )    

# Custom batch size  
rag  =  LightRAG ( addon_params = { "insert_batch_size" : 4 } )   
rag . insert ( [ "text1" , "text2" , . . . ] ) # Process 4 documents per batch (default 10)      

Insert with ID

# Single text with ID  
rag . insert ( "text1" ,  ids = [ "ID_FOR_TEXT1" ] )  
# Multiple texts with ID list (must be consistent with the number of texts)  
rag . insert ( [ "text1" , "text2" ] ,  ids = [ "ID1" , "ID2" ] )    

Pipeline Insertion

# Asynchronously enqueue and process documents (suitable for background incremental processing)  
await  rag . apipeline_enqueue_documents ( input_data )  
await  rag . apipeline_process_enqueue_documents ( )  

Multiple file type support

import  textract  
file_path  = "document.pdf"   
text_content  =  textract . process ( file_path ) . decode ( "utf-8" )  
rag . insert ( text_content )  

Inserting a custom knowledge graph

custom_kg  = {   
    "chunks" : [ { "content" : "text chunk" , "source_id" : "doc-1" } ] ,      
    "entities" : [ { "entity_name" : "Entity" , "description" : "Description" } ] ,      
    "relationships" : [ { "src_id" : "A" , "tgt_id" : "B" , "description" : "Relationship" } ]        
}  
rag . insert_custom_kg ( custom_kg )  

Reference function

#Insert a document with a file path (supports traceability)  
documents  = [ "Content 1" , "Content 2" ]    
file_paths  = [ "path1.txt" , "path2.txt" ]    
rag . insert ( documents ,  file_paths = file_paths )  

Storage Configuration

Using Neo4J Storage

export NEO4J_URI = "neo4j://localhost:7687"  
export NEO4J_USERNAME = "neo4j"  
export NEO4J_PASSWORD = "password"  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = WORKING_DIR ,  
        llm_model_func = gpt_4o_mini_complete ,  
        graph_storage = "Neo4JStorage" , # Override default graph storage (NetworkX)    
    )  
    await  rag . initialize_storages ( )  
    return  rag  

Using PostgreSQL for storage

Supports PGVector (vector storage) and Apache AGE (graph storage), suitable for production environments:

# Example: Using PostgreSQL + AGE  
rag  =  LightRAG (  
    graph_storage = "AGEStorage" ,  
    vector_storage = "PGVectorStorage" ,  
    kv_storage = "PGKVStorage" ,  
    . . .  
)  

Using Faiss Storage

# Install dependencies: pip install faiss-cpu (or faiss-gpu)  
async def embedding_func ( texts : list [ str ] ) - >  np . ndarray :      
    from  sentence_transformers  import  SentenceTransformer  
    model  =  SentenceTransformer ( 'all-MiniLM-L6-v2' )  
    return  model . encode ( texts ,  convert_to_numpy = True )  

rag  =  LightRAG (  
    vector_storage = "FaissVectorDBStorage" ,  
    vector_db_storage_cls_kwargs = { "cosine_better_than_threshold" : 0.3 } ,   
    embedding_func = EmbeddingFunc ( embedding_dim = 384 ,  func = embedding_func ) ,  
    . . .  
)  

Data Deletion

# Delete by entity name  
rag.delete_by_entity ( "entity name " )  
# Delete by document ID (cascade delete associated entities and relationships)  
rag .delete_by_doc_id ( "doc_id " )  

Knowledge Graph Editing

Supports creation and editing of entities and relationships, maintaining consistency between graph database and vector database:

Creating entities and relationships

# Create entity  
entity  =  rag . create_entity ( "Google" , { "description" : "Technology company" , "entity_type" : "company" } )      
# Create a relationship  
relation  =  rag . create_relation ( "Google" , "Gmail" , { "description" : "Development of email service" } )     

Editing entities and relationships

# Update entity  
updated_entity  =  rag . edit_entity ( "Google" , { "description" : "An Alphabet subsidiary" } )    
# Rename entities (automatically migrate relationships)  
renamed_entity  =  rag . edit_entity ( "Gmail" , { "entity_name" : "Google Mail" } )    
# Update relations  
updated_relation  =  rag . edit_relation ( "Google" , "Google Mail" , { "description" : "Maintenance of email service" } )     

Data Export

Supports exporting knowledge graphs to CSV, Excel, Markdown and other formats:

# Export to CSV (default format)  
rag . export_data ( "knowledge_graph.csv" )  
# Specify the format (Excel/Markdown/Text)  
rag . export_data ( "output.xlsx" ,  file_format = "excel" )  
# Contains vector data  
rag . export_data ( "complete_data.csv" ,  include_vector_data = True )  

Entity merger

Automatically merge multiple entities and their relationships, handling conflicts and duplicates:

# Basic merge  
rag.merge_entities (  
    source_entities = [ "AI" , "Artificial Intelligence" , "Machine Learning" ] ,    
    target_entity = "Artificial Intelligence Technology"  
)  
# Custom merge strategy  
rag.merge_entities (  
    source_entities = [ "John" , "John Doe" ] ,   
    target_entity = "John Smith" ,  
    merge_strategy = { "description" : "join" , "entity_type" : "keep first" }     
)  

Cache Management

Clear the LLM response cache for different modes:

# Clear all caches  
await  rag . aclear_cache ( )  
# Clear the specified mode (such as local search)  
await  rag . aclear_cache ( modes = [ "local" ] )  
# Synchronous version  
rag . clear_cache ( modes = [ "global" ] )  

LightRAG initialization parameters

parametertypeillustratedefault value
working_dir
str
Cache storage directory
lightrag_cache+timestamp
kv_storage
str
Document and text block storage types (support Json/PG/Redis/Mongo)
JsonKVStorage
vector_storage
str
Embedded vector storage type (supports Nano/PG/Milvus/Chroma/Faiss, etc.)
NanoVectorDBStorage
graph_storage
str
Graph storage type (supports NetworkX/Neo4J/PGGraph/AGE)
NetworkXStorage
doc_status_storage
str
Document processing status storage type
JsonDocStatusStorage
chunk_token_size
int
The maximum number of tokens in a document chunk
1200
embedding_func
EmbeddingFunc
Embedded Function
openai_embed
llm_model_func
callable
LLM Generator
gpt_4o_mini_complete
For more parameters, see the documentation.

Error handling

The API includes comprehensive error handling:

  • File not found (404)
  • Handling Errors (500)
  • Support multiple encodings (UTF-8/GBK)

LightRAG Server

The LightRAG server provides a web interface and API, supporting knowledge graph visualization, document index management, and other functions. For details, see  the LightRAG Server documentation .

Evaluate

Dataset

The evaluation dataset can be downloaded from  TommyChien/UltraDomain  .

Generate a query

use example/generate_query.py Generate high-level queries to automatically create users, tasks, and questions based on dataset descriptions.

Batch evaluation

pass example/batch_eval.py Compare the performance of different RAG systems and   evaluate them based on three dimensions: comprehensiveness, diversity, and empowerment .

Reproduce the experiment

All reproducible code is located at ./reproduce Directory, steps include:

  1. Extracting unique context
  2. Insertion into LightRAG system
  3. Generate a query and execute it