Table of Content
LightRAG: A simple and fast search enhancement generation framework to get started quickly

Updated on:June-27th-2025
Recommendation
Explore new options for efficient Q&A and content generation. The LightRAG framework allows you to easily master RAG technology.
Core content:
1. LightRAG framework's lightweight design and advantages
2. Quick deployment and installation guide
3. Practical application examples and environment configuration
Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
Install
Installing the LightRAG Core Module
cd LightRAG
pip install -e .pip install lightrag-hku
Install LightRAG Server
pip install "lightrag-hku[api]"
# Create a Python virtual environment on demand
# Install API support for editable mode
pip install -e ".[api]"
Quick Start
- Local running demo video
. All code examples are located in examples
Table of contents.When using the OpenAI model, you need to set environment variables: export OPENAI_API_KEY="sk-..."
.Download the demo text of A Christmas Carol: curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
Query
import os
import asyncio
from lightrag import LightRAG , QueryParam
from lightrag . llm . openai import gpt_4o_mini_complete , gpt_4o_complete , openai_embed
from lightrag .kg .shared_storage import initialize_pipeline_status
from lightrag .utils import setup_logger
setup_logger ( "lightrag" , level = "INFO" )
async def initialize_rag ( ) :
rag = LightRAG (
working_dir = "your/path" ,
embedding_func = openai_embed ,
llm_model_func = gpt_4o_mini_complete
)
await rag . initialize_storages ( )
await initialize_pipeline_status ( )
return rag
def main ( ) :
# Initialize the RAG instance
rag = asyncio . run ( initialize_rag ( ) )
# Insert text
rag . insert ( "Your text" )
# Select the search mode: simple search, local search, global search, hybrid search, knowledge graph and vector hybrid search
mode = "mix"
rag . query (
"What are the top themes in this story?" ,
param = QueryParam ( mode = mode )
)
if __name__ == "__main__" :
main ( )
QueryParam
class QueryParam :
mode : Literal [ "local" , "global" , "hybrid" , "naive" , "mix" ] = "global"
"""Search mode:
- "local": focus on contextually relevant information
- "global": use global knowledge
- "hybrid": Combine local and global searches
- "naive": basic search (no advanced techniques)
- "mix": Fusion of knowledge graph and vector retrieval (supports structured KG and unstructured vectors, processes image content through HTML img tags, and controls retrieval depth through top_k)
"""
only_need_context : bool = False
"""If True, only the retrieved context is returned without generating an answer"""
response_type : str = "Multiple Paragraphs"
"""Response format (e.g., "multiple paragraphs", "single paragraph", "bullet")"""
top_k : int = 60
"""The number of top retrieved (number of entities in local mode, number of relationships in global mode)"""
max_token_for_text_unit : int = 4000
"""The maximum number of tokens per retrieved text block"""
max_token_for_global_context : int = 4000
"""The maximum number of tokens in a relationship description in a global search"""
max_token_for_local_context : int = 4000
"""The maximum number of tokens in entity descriptions in local searches"""
ids : list [ str ] | None = None # Only supports PG Vector database
"""Filter the RAG ID list"""
model_func : Callable [ . . . , object ] | None = None
"""Optional: Override the LLM model function for this query (different models can be used for different modes)"""
. . .
LLM and Embedding Model Injection
Using OpenAI style API
async def llm_model_func (
prompt , system_prompt = None , history_messages = [ ] , keyword_extraction = False , ** kwargs
) - > str :
return await openai_complete_if_cache (
"solar-mini" ,
prompt ,
system_prompt = system_prompt ,
history_messages = history_messages ,
api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,
base_url = "https://api.upstage.ai/v1/solar" ,
** kwargs
)
async def embedding_func ( texts : list [ str ] ) - > np . ndarray :
return await openai_embed (
texts ,
model = "solar-embedding-1-large-query" ,
api_key = os . getenv ( "UPSTAGE_API_KEY" ) ,
base_url = "https://api.upstage.ai/v1/solar"
)
async def initialize_rag ( ) :
rag = LightRAG (
working_dir = WORKING_DIR ,
llm_model_func = llm_model_func ,
embedding_func = EmbeddingFunc (
embedding_dim = 4096 ,
max_token_size = 8192 ,
func = embedding_func
)
)
await rag . initialize_storages ( )
await initialize_pipeline_status ( )
return rag
Using the Hugging Face model
# Initialize LightRAG and use the Hugging Face model
rag = LightRAG (
working_dir = WORKING_DIR ,
llm_model_func = hf_model_complete , # Hugging Face text generation model
llm_model_name = 'meta-llama/Llama-3.1-8B-Instruct' , # Hugging Face model name
# Using Hugging Face embedding function
embedding_func = EmbeddingFunc (
embedding_dim = 384 ,
max_token_size = 5000 ,
func = lambda texts : hf_embed (
texts ,
tokenizer = AutoTokenizer . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" ) ,
embed_model = AutoModel . from_pretrained ( "sentence-transformers/all-MiniLM-L6-v2" )
)
) ,
)
Using the Ollama model
# Initialize LightRAG and use the Ollama model
rag = LightRAG (
working_dir = WORKING_DIR ,
llm_model_func = ollama_model_complete , # Ollama text generation model
llm_model_name = 'your_model_name' , # model name
# Using Ollama to embed functions
embedding_func = EmbeddingFunc (
embedding_dim = 768 ,
max_token_size = 8192 ,
func = lambda texts : ollama_embed (
texts ,
embed_model = "nomic-embed-text"
)
) ,
)
- Modify Modelfile
(Default context 8k, at least 32k required):
Pull model: ollama pull qwen2
Export the model file: ollama show --modelfile qwen2 > Modelfile
Add parameters: PARAMETER num_ctx 32768
Create the modified model: ollama create -f Modelfile qwen2m
- Setting via Ollama API
: rag = LightRAG (
. . .
llm_model_kwargs = { "options" : { "num_ctx" : 32768 } } ,
. . .
)
Integrating LlamaIndex
# Use LlamaIndex to directly access OpenAI
import asyncio
from lightrag import LightRAG
from lightrag . llm . llama_index_impl import llama_index_complete_if_cache , llama_index_embed
from llama_index . embeddings . openai import OpenAIEmbedding
from llama_index . llms . openai import OpenAI
from lightrag .kg .shared_storage import initialize_pipeline_status
from lightrag .utils import setup_logger
setup_logger ( "lightrag" , level = "INFO" )
async def initialize_rag ( ) :
rag = LightRAG (
working_dir = "your/path" ,
llm_model_func = llama_index_complete_if_cache , # LlamaIndex compatible generation function
embedding_func = EmbeddingFunc (
embedding_dim = 1536 ,
max_token_size = 8192 ,
func = lambda texts : llama_index_embed ( texts , embed_model = embed_model )
) ,
)
await rag . initialize_storages ( )
await initialize_pipeline_status ( )
return rag
def main ( ) :
rag = asyncio . run ( initialize_rag ( ) )
with open ( "./book.txt" , "r" , encoding = "utf-8" ) as f :
rag . insert ( f . read ( ) )
# Execute queries in different modes...
if __name__ == "__main__" :
main ( )
- LlamaIndex documentation
- Direct OpenAI Example
- LiteLLM Agent Example
Token usage tracking
from lightrag .utils import TokenTracker
# Method 1: Context Manager (Recommended)
with TokenTracker ( ) as tracker :
result1 = await llm_model_func ( "Question 1" )
result2 = await llm_model_func ( "Question 2" )
print ( "Total token consumption: " , tracker . get_usage ( ) )
# Method 2: Manual Recording
tracker = TokenTracker ( )
tracker . reset ( )
rag . insert ( )
rag . query ( "Question 1" , param = QueryParam ( mode = "naive" ) )
print ( "Token usage for insert and query: " , tracker . get_usage ( ) )
Long-running sessions or batch operations are automatically tracked using context managers Manual call for segmented statistics reset()
Regularly check token usage during the development and testing phase
Conversation history support
conversation_history = [
{ "role" : "user" , "content" : "What is the protagonist's attitude towards Christmas?" } ,
{ "role" : "assistant" , "content" : "At the beginning of the story, Ebenezer Scrooge has a negative attitude towards Christmas..." } ,
{ "role" : "user" , "content" : "How did his attitude change?" }
]
query_param = QueryParam (
mode = "mix" , # supports all modes
conversation_history = conversation_history ,
history_turns = 3 # Consider the last 3 rounds of conversation
)
response = rag . query ( "What is the reason for this personality change?" , param = query_param )
Custom prompt words
custom_prompt = """
You are an expert in environmental science, provide a detailed and structured response, and include examples.
---Dialogue History---
{history}
---knowledge base---
{context_data}
---Response Rules---
Target format and length: {response_type}
"""
response_custom = rag . query (
"What are the main advantages of renewable energy?" ,
param = QueryParam ( mode = "hybrid" ) ,
system_prompt = custom_prompt
)
Independent keyword extraction
rag . query_with_separate_keyword_extraction (
query = "Explain the law of universal gravitation" ,
prompt = "Provide detailed explanations for high school students studying physics" ,
param = QueryParam ( mode = "hybrid" )
)
Data Insertion
Basic Insert
# Single text insertion
rag.insert ( "text content " )
Batch Insert
# Batch insert multiple texts
rag . insert ( [ "text1" , "text2" , . . . ] )
# Custom batch size
rag = LightRAG ( addon_params = { "insert_batch_size" : 4 } )
rag . insert ( [ "text1" , "text2" , . . . ] ) # Process 4 documents per batch (default 10)
Insert with ID
# Single text with ID
rag . insert ( "text1" , ids = [ "ID_FOR_TEXT1" ] )
# Multiple texts with ID list (must be consistent with the number of texts)
rag . insert ( [ "text1" , "text2" ] , ids = [ "ID1" , "ID2" ] )
Pipeline Insertion
# Asynchronously enqueue and process documents (suitable for background incremental processing)
await rag . apipeline_enqueue_documents ( input_data )
await rag . apipeline_process_enqueue_documents ( )
Multiple file type support
import textract
file_path = "document.pdf"
text_content = textract . process ( file_path ) . decode ( "utf-8" )
rag . insert ( text_content )
Inserting a custom knowledge graph
custom_kg = {
"chunks" : [ { "content" : "text chunk" , "source_id" : "doc-1" } ] ,
"entities" : [ { "entity_name" : "Entity" , "description" : "Description" } ] ,
"relationships" : [ { "src_id" : "A" , "tgt_id" : "B" , "description" : "Relationship" } ]
}
rag . insert_custom_kg ( custom_kg )
Reference function
#Insert a document with a file path (supports traceability)
documents = [ "Content 1" , "Content 2" ]
file_paths = [ "path1.txt" , "path2.txt" ]
rag . insert ( documents , file_paths = file_paths )
Storage Configuration
Using Neo4J Storage
export NEO4J_URI = "neo4j://localhost:7687"
export NEO4J_USERNAME = "neo4j"
export NEO4J_PASSWORD = "password"
async def initialize_rag ( ) :
rag = LightRAG (
working_dir = WORKING_DIR ,
llm_model_func = gpt_4o_mini_complete ,
graph_storage = "Neo4JStorage" , # Override default graph storage (NetworkX)
)
await rag . initialize_storages ( )
return rag
Using PostgreSQL for storage
# Example: Using PostgreSQL + AGE
rag = LightRAG (
graph_storage = "AGEStorage" ,
vector_storage = "PGVectorStorage" ,
kv_storage = "PGKVStorage" ,
. . .
)
Using Faiss Storage
# Install dependencies: pip install faiss-cpu (or faiss-gpu)
async def embedding_func ( texts : list [ str ] ) - > np . ndarray :
from sentence_transformers import SentenceTransformer
model = SentenceTransformer ( 'all-MiniLM-L6-v2' )
return model . encode ( texts , convert_to_numpy = True )
rag = LightRAG (
vector_storage = "FaissVectorDBStorage" ,
vector_db_storage_cls_kwargs = { "cosine_better_than_threshold" : 0.3 } ,
embedding_func = EmbeddingFunc ( embedding_dim = 384 , func = embedding_func ) ,
. . .
)
Data Deletion
# Delete by entity name
rag.delete_by_entity ( "entity name " )
# Delete by document ID (cascade delete associated entities and relationships)
rag .delete_by_doc_id ( "doc_id " )
Knowledge Graph Editing
Creating entities and relationships
# Create entity
entity = rag . create_entity ( "Google" , { "description" : "Technology company" , "entity_type" : "company" } )
# Create a relationship
relation = rag . create_relation ( "Google" , "Gmail" , { "description" : "Development of email service" } )
Editing entities and relationships
# Update entity
updated_entity = rag . edit_entity ( "Google" , { "description" : "An Alphabet subsidiary" } )
# Rename entities (automatically migrate relationships)
renamed_entity = rag . edit_entity ( "Gmail" , { "entity_name" : "Google Mail" } )
# Update relations
updated_relation = rag . edit_relation ( "Google" , "Google Mail" , { "description" : "Maintenance of email service" } )
Data Export
# Export to CSV (default format)
rag . export_data ( "knowledge_graph.csv" )
# Specify the format (Excel/Markdown/Text)
rag . export_data ( "output.xlsx" , file_format = "excel" )
# Contains vector data
rag . export_data ( "complete_data.csv" , include_vector_data = True )
Entity merger
# Basic merge
rag.merge_entities (
source_entities = [ "AI" , "Artificial Intelligence" , "Machine Learning" ] ,
target_entity = "Artificial Intelligence Technology"
)
# Custom merge strategy
rag.merge_entities (
source_entities = [ "John" , "John Doe" ] ,
target_entity = "John Smith" ,
merge_strategy = { "description" : "join" , "entity_type" : "keep first" }
)
Cache Management
# Clear all caches
await rag . aclear_cache ( )
# Clear the specified mode (such as local search)
await rag . aclear_cache ( modes = [ "local" ] )
# Synchronous version
rag . clear_cache ( modes = [ "global" ] )
LightRAG initialization parameters
parameter | type | illustrate | default value |
---|---|---|---|
lightrag_cache+timestamp | |||
JsonKVStorage | |||
NanoVectorDBStorage | |||
NetworkXStorage | |||
JsonDocStatusStorage | |||
openai_embed | |||
gpt_4o_mini_complete | |||
Error handling
File not found (404) Handling Errors (500) Support multiple encodings (UTF-8/GBK)
LightRAG Server
Evaluate
Dataset
Generate a query
Batch evaluation
Reproduce the experiment
Extracting unique context Insertion into LightRAG system Generate a query and execute it