Understand RAG to build knowledge base and knowledge graph in one article

Written by

Clara Bennett

Updated on:June-27th-2025

RAG (Retrieval-Augmented Generation) technology significantly improves the accuracy and timeliness of knowledge questions and answers through retrieval-augmented generation. When building a knowledge base, RAG uses a vector database and a dynamic update mechanism to achieve efficient knowledge retrieval and generation ; when building a knowledge graph, RAG uses frameworks such as GraphRAG and Grapusion to achieve accurate extraction of entity relationships and graph fusion .

1. RAG

What is RAG (Retrieval-Augmented Generation) ?RAG is an artificial intelligence technology that combinesinformation retrieval and text generation. It aims to solve the hallucination problem of large language models by introducing external knowledge bases.

The core goal of RAG is to enable the Large Language Model (LLM) to no longer rely solely on the solidified knowledge during training when answering questions , but to dynamically retrieve the latest or field-specific information to assist in generating answers.

RAG combines information retrieval and generation models and works in three stages:
Retrieval: Searching for information related to the problem from external knowledge bases (such as documents and databases).
Enhancement: Use retrieval results as contextual input to assist the generative model in understanding the context of the question.
Generate: Generate coherent and accurate answers based on the search content and the model’s own knowledge.

How to use Prompt + RAG in practice? The practical application of Prompt engineering and RAG (retrieval enhanced generation) needs to focus on data preparation, retrieval optimization, generation control and other links.

1. Data Preparation and Vectorization

1. Document preprocessing and chunking

Document preprocessing achieves text normalization through multimodal data cleaning, word form restoration and dependency syntax analysis ; the block segmentation stage adopts recursive segmentation and semantic boundary recognition technology , combined with knowledge graph association optimization , to construct dynamic overlapping contextual coherent units to balance retrieval efficiency and information integrity.

# Dependency installation: pip install langchain langchain-text-splittersfrom langchain_text_splitters import RecursiveCharacterTextSplitter# Sample long text (replace with actual text)text = """Natural language processing (NLP) is an important branch of artificial intelligence, involving tasks such as text analysis, machine translation, and sentiment analysis. Chunking technology can split long text into logically coherent semantic units for subsequent processing."""# Initialize recursive chunker (chunk size 300 characters, overlap 50 characters to maintain context)text_splitter = RecursiveCharacterTextSplitter( chunk_size=300, chunk_overlap=50, separators=["\n\n", "\n", "。", "！", "？"] # Prioritize paragraph/sentence boundaries [2,4](@ref))# Execute chunkingchunks = text_splitter.split_text(text)# Print chunking resultsfor i, chunk in enumerate(chunks):    print(f"Chunk {i+1}:\n{chunk}\n{'-'*50}")

2. Vectorization and Storage

Vectorization maps unstructured data (text, images, etc.) into high-dimensional semantic vectors through the Embedding model . Storage relies on dedicated vector databases ( such as ElasticSearch's dense_vector field, Milvus ) to build efficient indexes (HNSW, FAISS) , supporting approximate nearest neighbor search (ANN) to achieve fast similarity matching of large-scale vector data.

# Dependency installation: pip install sentence-transformers faiss-cpufrom sentence_transformers import SentenceTransformerfrom langchain_community.vectorstores import FAISS# 1. Text vectorization (using MiniLM-L6 pre-trained model)model = SentenceTransformer('paraphrase-MiniLM-L6-v2')embeddings = model.encode(chunks)# 2. Vector storage to FAISS index libraryvector_db = FAISS.from_texts( texts=chunks, embedding=embeddings, metadatas=[{"source": "web_data"}] * len(chunks) # Metadata can be added)# Save index to localvector_db.save_local("my_vector_db")# Example query: Retrieve similar textquery = "What is natural language processing?"query_embedding = model.encode([query])scores, indices = vector_db.similarity_search_with_score(query_embedding, k=3)print(f"Top 3 similar blocks: {indices}")

2. Search Optimization Technology

Improve the recall rate and ranking quality through multi-way recall (such as hybrid retrieval, HyDE rewriting, and dynamic rearrangement), and optimize the retrieval results by using context enhancement (supplementary relationships in the knowledge graph, and dynamic generation of prompts by instruction-level RAG).

3. Prompt Engineering Practice

1. Structured input design

Constraints are imposed based on roles and scenarios, such as the role of legal advisor and the contract terms consultation scenario. Combined with the knowledge unit of Article 580 of the Civil Code, through the thinking chain of "user question → retrieval of knowledge → logical association → generation of answers" , the reference sources are explained point by point and marked.

2. Output template control

By presetting a templated output framework to ensure format compliance , and setting up a dynamic guardrail mechanism to filter sensitive words and verify factual consistency , the security and compliance of content generation can be achieved.

2. Knowledge Base and Knowledge Graph

What is a knowledge base? A knowledge base is a structured, easy-to-operate knowledge cluster that systematically integrates domain-related knowledge (such as theories, facts, rules, etc.) to provide a basic platform for problem solving, decision support, and knowledge sharing.

The core of RAG's knowledge base construction lies in combining external knowledge retrieval with large language model generation capabilities , providing contextual support for generation through efficient retrieval, thereby improving the accuracy and timeliness of answers. (The focus of actual combat is on text chunking and vectorized embedding)

1. Chunking

Text chunking is the process of breaking long text into smaller, manageable segments for more efficient processing and analysis.

2. Vectorization (Embedding)

Vectorization is the process of mapping text or data into numerical representations in a high-dimensional vector space to capture semantic features.

What is a Knowledge Graph? A Knowledge Graph is a semantic network structure built through entities and relationships that supports reasoning and complex queries, while traditional knowledge bases mostly store data in a non-associative, flat manner.

The core of RAG's knowledge graph construction is to integrate the structured and unstructured data in the external knowledge base into a graph form by combining retrieval technology with the large language model (LLM). The knowledge graph injects structured reasoning capabilities into the RAG system , allowing it to evolve from an "information retriever" to a "knowledge reasoning engine."

The key to RAG's knowledge graph construction lies in the collaboration between retrieval and generation . Its process includes:

Data preprocessing: split the document into chunks and extract entities and relations through named entity recognition (NER) .
Knowledge graph indexing: After constructing the initial knowledge graph based on the extracted entities and relationships, a clustering algorithm (such as the Leiden algorithm) is used to divide the nodes in the graph into communities.
Retrieval enhancement: When a user queries, the context is enhanced through local search (based on entities) or global search (based on dataset topics) to improve the accuracy of generated answers