Woter AI detection.Hurry - ends Jul 9th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

LightRAG: A simple and fast search enhancement generation framework to get started quickly

Written by

Clara Bennett

Updated on:June-27th-2025

LightRAG is a lightweight retrieval-augmented generation (RAG) framework that aims to reduce the deployment cost and resource consumption of traditional RAG systems by optimizing the retrieval and generation processes while maintaining efficient question-answering and content generation capabilities. It is suitable for scenarios that require fast response and low computing power support, such as intelligent customer service, knowledge question-answering, and lightweight AI assistants.

Install

Installing the LightRAG Core Module

Install from source (recommended)

cd  LightRAG  
pip  install  -e  .

Install via PyPI
```
pip  install  lightrag-hku  
```

Install LightRAG Server

The LightRAG server provides a web interface and API support to facilitate document indexing, knowledge graph exploration, and simple RAG queries. It is also compatible with the Ollama interface and can be simulated as an Ollama chat model, making it easy for AI chatbots such as Open WebUI to access it.

Install via PyPI
```
pip  install "lightrag-hku[api]"   
```

Installing from Source

# Create a Python virtual environment on demand  
# Install API support for editable mode  
pip  install  -e  ".[api]"

For more information about LightRAG server, please refer to LightRAG Server .

Quick Start

Local running demo video
.
All code examples are located in examples Table of contents.
When using the OpenAI model, you need to set environment variables:export OPENAI_API_KEY="sk-...".

Download the demo text of A Christmas Carol:

curl  https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt  >  ./book.txt

Query

Use the following Python code to initialize LightRAG and execute a query:

import  os  
import  asyncio  
from  lightrag  import  LightRAG ,  QueryParam  
from  lightrag . llm . openai  import  gpt_4o_mini_complete ,  gpt_4o_complete ,  openai_embed  
from  lightrag .kg .shared_storage import  initialize_pipeline_status   
from  lightrag .utils import   setup_logger  

setup_logger ( "lightrag" ,  level = "INFO" )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = "your/path" ,  
        embedding_func = openai_embed ,  
        llm_model_func = gpt_4o_mini_complete  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag  

def main ( ) :   
    # Initialize the RAG instance  
    rag  =  asyncio . run ( initialize_rag ( ) )  
    # Insert text  
    rag . insert ( "Your text" )  

    # Select the search mode: simple search, local search, global search, hybrid search, knowledge graph and vector hybrid search  
    mode  = "mix"   

    rag . query (  
        "What are the top themes in this story?" ,  
        param = QueryParam ( mode = mode )  
    )  

if  __name__  == "__main__" :   
    main ( )

QueryParam

class QueryParam :   
    mode :  Literal [ "local" , "global" , "hybrid" , "naive" , "mix" ] = "global"        
    """Search mode:  
    - "local": focus on contextually relevant information  
    - "global": use global knowledge  
    - "hybrid": Combine local and global searches  
    - "naive": basic search (no advanced techniques)  
    - "mix": Fusion of knowledge graph and vector retrieval (supports structured KG and unstructured vectors, processes image content through HTML img tags, and controls retrieval depth through top_k)  
    """  
    only_need_context : bool = False     
    """If True, only the retrieved context is returned without generating an answer"""  
    response_type : str = "Multiple Paragraphs"     
    """Response format (e.g., "multiple paragraphs", "single paragraph", "bullet")"""  
    top_k : int = 60     
    """The number of top retrieved (number of entities in local mode, number of relationships in global mode)"""  
    max_token_for_text_unit : int = 4000     
    """The maximum number of tokens per retrieved text block"""  
    max_token_for_global_context : int = 4000     
    """The maximum number of tokens in a relationship description in a global search"""  
    max_token_for_local_context : int = 4000     
    """The maximum number of tokens in entity descriptions in local searches"""  
    ids : list [ str ] | None = None # Only supports PG Vector database         
    """Filter the RAG ID list"""  
    model_func :  Callable [ . . . , object ] | None = None       
    """Optional: Override the LLM model function for this query (different models can be used for different modes)"""  
    . . .

The default value of top_k can be set via the environment variable TOP_K Revise.

LLM and Embedding Model Injection

LightRAG needs to inject LLM and call methods of embedded models to complete document indexing and query tasks.

Using OpenAI style API

async def llm_model_func (    
    prompt ,  system_prompt = None ,  history_messages = [ ] ,  keyword_extraction = False , ** kwargs   
) - > str :    
    return await  openai_complete_if_cache (   
        "solar-mini" ,  
        prompt ,  
        system_prompt = system_prompt ,  
        history_messages = history_messages ,  
        api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,  
        base_url = "https://api.upstage.ai/v1/solar" ,  
        ** kwargs  
    )  

async def embedding_func ( texts : list [ str ] ) - >  np . ndarray :      
    return await  openai_embed (   
        texts ,  
        model = "solar-embedding-1-large-query" ,  
        api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,  
        base_url = "https://api.upstage.ai/v1/solar"  
    )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = WORKING_DIR ,  
        llm_model_func = llm_model_func ,  
        embedding_func = EmbeddingFunc (  
            embedding_dim = 4096 ,  
            max_token_size = 8192 ,  
            func = embedding_func  
        )  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag

Using the Hugging Face model

refer to lightrag_hf_demo.py:

# Initialize LightRAG and use the Hugging Face model  
rag  =  LightRAG (  
    working_dir = WORKING_DIR ,  
    llm_model_func = hf_model_complete , # Hugging Face text generation model    
    llm_model_name = 'meta-llama/Llama-3.1-8B-Instruct' , # Hugging Face model name    
    # Using Hugging Face embedding function  
    embedding_func = EmbeddingFunc (  
        embedding_dim = 384 ,  
        max_token_size = 5000 ,  
        func = lambda  texts :  hf_embed (  
            texts ,  
            tokenizer = AutoTokenizer . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" ) ,  
            embed_model = AutoModel . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" )  
        )  
    ) ,  
)

Using the Ollama model

Overview : You need to pull models first (such as text generation models and embedding models nomic-embed-text):

# Initialize LightRAG and use the Ollama model  
rag  =  LightRAG (  
    working_dir = WORKING_DIR ,  
    llm_model_func = ollama_model_complete , # Ollama text generation model    
    llm_model_name = 'your_model_name' , # model name    
    # Using Ollama to embed functions  
    embedding_func = EmbeddingFunc (  
        embedding_dim = 768 ,  
        max_token_size = 8192 ,  
        func = lambda  texts :  ollama_embed (  
            texts ,  
            embed_model = "nomic-embed-text"  
        )  
    ) ,  
)

Increase the context length :

Modify Modelfile
(Default context 8k, at least 32k required):

Pull model:ollama pull qwen2
Export the model file:ollama show --modelfile qwen2 > Modelfile
Add parameters:PARAMETER num_ctx 32768
Create the modified model:ollama create -f Modelfile qwen2m

Setting via Ollama API

rag  =  LightRAG (  
    . . .  
    llm_model_kwargs = { "options" : { "num_ctx" : 32768 } } ,    
    . . .  
)

Low-memory GPUs : Select a small model and adjust the context window (such as gemma2:2b with 26k context).

Integrating LlamaIndex

LightRAG supports integration with LlamaIndex (see llm/llama_index_impl.py):

# Use LlamaIndex to directly access OpenAI  
import  asyncio  
from  lightrag  import  LightRAG  
from  lightrag . llm . llama_index_impl  import  llama_index_complete_if_cache ,  llama_index_embed  
from  llama_index . embeddings . openai  import  OpenAIEmbedding  
from  llama_index . llms . openai  import  OpenAI  
from  lightrag .kg .shared_storage import  initialize_pipeline_status   
from  lightrag .utils import   setup_logger  

setup_logger ( "lightrag" ,  level = "INFO" )  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = "your/path" ,  
        llm_model_func = llama_index_complete_if_cache , # LlamaIndex compatible generation function    
        embedding_func = EmbeddingFunc (  
            embedding_dim = 1536 ,  
            max_token_size = 8192 ,  
            func = lambda  texts :  llama_index_embed ( texts ,  embed_model = embed_model )  
        ) ,  
    )  

    await  rag . initialize_storages ( )  
    await  initialize_pipeline_status ( )  

    return  rag  

def main ( ) :   
    rag  =  asyncio . run ( initialize_rag ( ) )  
    with open ( "./book.txt" , "r" ,  encoding = "utf-8" ) as  f :     
        rag . insert ( f . read ( ) )  
    # Execute queries in different modes...  

if  __name__  == "__main__" :   
    main ( )

Detailed documentation and examples :

LlamaIndex documentation
Direct OpenAI Example
LiteLLM Agent Example

Token usage tracking

LightRAG TokenTracker The tool monitors LLM token consumption, making it easier to control API costs and optimize performance.

from  lightrag .utils import   TokenTracker  

# Method 1: Context Manager (Recommended)  
with  TokenTracker ( ) as  tracker :   
    result1  = await  llm_model_func ( "Question 1" )   
    result2  = await  llm_model_func ( "Question 2" )   
print ( "Total token consumption: " ,  tracker . get_usage ( ) )  

# Method 2: Manual Recording  
tracker  =  TokenTracker ( )  
tracker . reset ( )  
rag . insert ( )  
rag . query ( "Question 1" ,  param = QueryParam ( mode = "naive" ) )  
print ( "Token usage for insert and query: " ,  tracker . get_usage ( ) )

Tips :

Long-running sessions or batch operations are automatically tracked using context managers
Manual call for segmented statistics reset()
Regularly check token usage during the development and testing phase

Conversation history support

LightRAG supports multi-turn conversations and is context-aware by passing in the conversation history:

conversation_history  = [   
    { "role" : "user" , "content" : "What is the protagonist's attitude towards Christmas?" } ,     
    { "role" : "assistant" , "content" : "At the beginning of the story, Ebenezer Scrooge has a negative attitude towards Christmas..." } ,     
    { "role" : "user" , "content" : "How did his attitude change?" }     
]  

query_param  =  QueryParam (  
    mode = "mix" , # supports all modes    
    conversation_history = conversation_history ,  
    history_turns = 3 # Consider the last 3 rounds of conversation    
)  

response  =  rag . query ( "What is the reason for this personality change?" ,  param = query_param )

Custom prompt words

Supports custom system prompt words to finely control model behavior:

custom_prompt  = """   
You are an expert in environmental science, provide a detailed and structured response, and include examples.  
---Dialogue History---  
{history}  
---knowledge base---  
{context_data}  
---Response Rules---  
Target format and length: {response_type}  
"""  

response_custom  =  rag . query (  
    "What are the main advantages of renewable energy?" ,  
    param = QueryParam ( mode = "hybrid" ) ,  
    system_prompt = custom_prompt  
)

Independent keyword extraction

Added query_with_separate_keyword_extraction Function, separates keyword extraction from user prompts and focuses on query intent:

rag . query_with_separate_keyword_extraction (  
    query = "Explain the law of universal gravitation" ,  
    prompt = "Provide detailed explanations for high school students studying physics" ,  
    param = QueryParam ( mode = "hybrid" )  
)

Data Insertion

Basic Insert

# Single text insertion  
rag.insert ( "text content " )

Batch Insert

# Batch insert multiple texts  
rag . insert ( [ "text1" , "text2" , . . . ] )    

# Custom batch size  
rag  =  LightRAG ( addon_params = { "insert_batch_size" : 4 } )   
rag . insert ( [ "text1" , "text2" , . . . ] ) # Process 4 documents per batch (default 10)

Insert with ID

# Single text with ID  
rag . insert ( "text1" ,  ids = [ "ID_FOR_TEXT1" ] )  
# Multiple texts with ID list (must be consistent with the number of texts)  
rag . insert ( [ "text1" , "text2" ] ,  ids = [ "ID1" , "ID2" ] )

Pipeline Insertion

# Asynchronously enqueue and process documents (suitable for background incremental processing)  
await  rag . apipeline_enqueue_documents ( input_data )  
await  rag . apipeline_process_enqueue_documents ( )

Multiple file type support

import  textract  
file_path  = "document.pdf"   
text_content  =  textract . process ( file_path ) . decode ( "utf-8" )  
rag . insert ( text_content )

Inserting a custom knowledge graph

custom_kg  = {   
    "chunks" : [ { "content" : "text chunk" , "source_id" : "doc-1" } ] ,      
    "entities" : [ { "entity_name" : "Entity" , "description" : "Description" } ] ,      
    "relationships" : [ { "src_id" : "A" , "tgt_id" : "B" , "description" : "Relationship" } ]        
}  
rag . insert_custom_kg ( custom_kg )

Reference function

#Insert a document with a file path (supports traceability)  
documents  = [ "Content 1" , "Content 2" ]    
file_paths  = [ "path1.txt" , "path2.txt" ]    
rag . insert ( documents ,  file_paths = file_paths )

Storage Configuration

Using Neo4J Storage

export NEO4J_URI = "neo4j://localhost:7687"  
export NEO4J_USERNAME = "neo4j"  
export NEO4J_PASSWORD = "password"  

async def initialize_rag ( ) :    
    rag  =  LightRAG (  
        working_dir = WORKING_DIR ,  
        llm_model_func = gpt_4o_mini_complete ,  
        graph_storage = "Neo4JStorage" , # Override default graph storage (NetworkX)    
    )  
    await  rag . initialize_storages ( )  
    return  rag

Using PostgreSQL for storage

Supports PGVector (vector storage) and Apache AGE (graph storage), suitable for production environments:

# Example: Using PostgreSQL + AGE  
rag  =  LightRAG (  
    graph_storage = "AGEStorage" ,  
    vector_storage = "PGVectorStorage" ,  
    kv_storage = "PGKVStorage" ,  
    . . .  
)

Using Faiss Storage

# Install dependencies: pip install faiss-cpu (or faiss-gpu)  
async def embedding_func ( texts : list [ str ] ) - >  np . ndarray :      
    from  sentence_transformers  import  SentenceTransformer  
    model  =  SentenceTransformer ( 'all-MiniLM-L6-v2' )  
    return  model . encode ( texts ,  convert_to_numpy = True )  

rag  =  LightRAG (  
    vector_storage = "FaissVectorDBStorage" ,  
    vector_db_storage_cls_kwargs = { "cosine_better_than_threshold" : 0.3 } ,   
    embedding_func = EmbeddingFunc ( embedding_dim = 384 ,  func = embedding_func ) ,  
    . . .  
)

Data Deletion

# Delete by entity name  
rag.delete_by_entity ( "entity name " )  
# Delete by document ID (cascade delete associated entities and relationships)  
rag .delete_by_doc_id ( "doc_id " )

Knowledge Graph Editing

Supports creation and editing of entities and relationships, maintaining consistency between graph database and vector database:

Creating entities and relationships

# Create entity  
entity  =  rag . create_entity ( "Google" , { "description" : "Technology company" , "entity_type" : "company" } )      
# Create a relationship  
relation  =  rag . create_relation ( "Google" , "Gmail" , { "description" : "Development of email service" } )

Editing entities and relationships

# Update entity  
updated_entity  =  rag . edit_entity ( "Google" , { "description" : "An Alphabet subsidiary" } )    
# Rename entities (automatically migrate relationships)  
renamed_entity  =  rag . edit_entity ( "Gmail" , { "entity_name" : "Google Mail" } )    
# Update relations  
updated_relation  =  rag . edit_relation ( "Google" , "Google Mail" , { "description" : "Maintenance of email service" } )

Data Export

Supports exporting knowledge graphs to CSV, Excel, Markdown and other formats:

# Export to CSV (default format)  
rag . export_data ( "knowledge_graph.csv" )  
# Specify the format (Excel/Markdown/Text)  
rag . export_data ( "output.xlsx" ,  file_format = "excel" )  
# Contains vector data  
rag . export_data ( "complete_data.csv" ,  include_vector_data = True )

Entity merger

Automatically merge multiple entities and their relationships, handling conflicts and duplicates:

# Basic merge  
rag.merge_entities (  
    source_entities = [ "AI" , "Artificial Intelligence" , "Machine Learning" ] ,    
    target_entity = "Artificial Intelligence Technology"  
)  
# Custom merge strategy  
rag.merge_entities (  
    source_entities = [ "John" , "John Doe" ] ,   
    target_entity = "John Smith" ,  
    merge_strategy = { "description" : "join" , "entity_type" : "keep first" }     
)

Cache Management

Clear the LLM response cache for different modes:

# Clear all caches  
await  rag . aclear_cache ( )  
# Clear the specified mode (such as local search)  
await  rag . aclear_cache ( modes = [ "local" ] )  
# Synchronous version  
rag . clear_cache ( modes = [ "global" ] )

LightRAG initialization parameters

parameter	type	illustrate	default value
working_dir	str	Cache storage directory	`lightrag_cache+timestamp`
kv_storage	str	Document and text block storage types (support Json/PG/Redis/Mongo)	`JsonKVStorage`
vector_storage	str	Embedded vector storage type (supports Nano/PG/Milvus/Chroma/Faiss, etc.)	`NanoVectorDBStorage`
graph_storage	str	Graph storage type (supports NetworkX/Neo4J/PGGraph/AGE)	`NetworkXStorage`
doc_status_storage	str	Document processing status storage type	`JsonDocStatusStorage`
chunk_token_size	int	The maximum number of tokens in a document chunk	1200
embedding_func	EmbeddingFunc	Embedded Function	`openai_embed`
llm_model_func	callable	LLM Generator	`gpt_4o_mini_complete`
…	…	For more parameters, see the documentation.	…

Error handling

The API includes comprehensive error handling:

File not found (404)
Handling Errors (500)
Support multiple encodings (UTF-8/GBK)

LightRAG Server

The LightRAG server provides a web interface and API, supporting knowledge graph visualization, document index management, and other functions. For details, see the LightRAG Server documentation .

Evaluate

Dataset

The evaluation dataset can be downloaded from TommyChien/UltraDomain .

Generate a query

use example/generate_query.py Generate high-level queries to automatically create users, tasks, and questions based on dataset descriptions.

Batch evaluation

pass example/batch_eval.py Compare the performance of different RAG systems and evaluate them based on three dimensions: comprehensiveness, diversity, and empowerment .

Reproduce the experiment

All reproducible code is located at ./reproduce Directory, steps include:

Extracting unique context
Insertion into LightRAG system
Generate a query and execute it