Woter AI detection.Hurry - ends Jul 25th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

NLP+Graph Technology: How to Create Efficient GraphRAG Applications at Low Cost?

Written by

Audrey Miles

Updated on:July-12th-2025

Using NLP to build hybrid graphs and empower RAG and GraphRAG applications

This article describes how to leverage the power of Natural Language Processing (NLP) to build hybrid graphs for Retrieval Augmented Generation (RAG) and GraphRAG applications.

What is GraphRAG?

What exactly is GraphRAG? What does it mean to you? The idea of combining standard RAG and GraphRAG into one package that can be switched at the flick of a switch is certainly an attractive one.

In fact, there is no specific and widely accepted definition of GraphRAG. Based on my experience, literature research and industry interviews, I estimate (please forgive Steven D. Levitt, I know this is not the right way to present statistics):

• 90% associated GraphRAG with Microsoft’s approach to building graphs (or variations thereof) and enabling search on them.
• 8% defined GraphRAG as using LLM-generated Cypher queries or text-to- any_graph_language (e.g. Cypher or SPARQL) to query LPG (Labeled Property Graph) or RDF (Resource Description Framework) graphs.
• The remaining 2% are either unsure or are exploring different possibilities.

Personally, I don’t fully agree with either of the first two definitions, and I want to explain why.

First, I must admit that I think Microsoft's GraphRAG is a very cool idea. In the next five years or so, it will probably be widely adopted and even become the mainstream choice among GraphRAG methods.

However, today, it is still too expensive and impractical for large-scale industrial use. The reality is that most companies lack the time, budget, and confidence to adopt this approach. Instead, they prefer to choose standard "vanilla" vector databases, which are more feasible given the current limitations. Confidence - because of the fact that there are not yet thousands of examples of GraphRAG in production (probably due to the reasons mentioned above).

In my opinion, Text-To-Cypher or Text-To-SPARQL techniques are a great alternative to Microsoft GraphRAG (although they can also be used together), and I have seen some very good use cases. However, there are some disadvantages. First, it requires a lot of expensive LLM calls to generate queries. Second, there is always a layer of uncertainty between you and your knowledge base - you rely on the quality of your written prompts, and the effectiveness of the chosen model to execute and build Cypher or SPARQL queries. In addition, the additional processing steps increase response time, and the higher implementation complexity increases challenges. All in all, this technology is very promising and powerful for certain applications, but its applicability depends on the specific use case.

Efficiency Optimization Dilemma

As a consultant and GenAI solution developer, my goal is to deliver GraphRAG at any scale - from small implementations to large enterprise-level solutions.

Scaling often comes with trade-offs, especially in terms of accuracy or efficiency. Therefore, it is valuable to explore more efficient solutions, provided complexity and cost are manageable. If a less complex and cost-effective solution still provides satisfactory results, then it is worth keeping it in the toolbox, right?

With this in mind, the approach proposed in this paper is to leverage the power of graphs for RAG (retrieval-augmented generation) without paying the high cost of graph creation itself. The challenge is to build and maintain a useful graph while minimizing dependencies on LLMs—or, ideally, using a small local LLM instead of calling expensive large cloud model APIs.

Fixed entity architecture

Some time ago, I published two Medium articles introducing a new approach for constructing RAG graphs, called Fixed Entity Architecture [1-2].

The core idea is to build a hierarchical graph:

• **First layer: ontology layer,** used to define the domain ontology. Since the scope of the ontology is usually limited, the size of this layer remains fixed or almost fixed.
• Second layer: Document layer - consists of document chunks, similar to what you would find in any vector database. Applying a vector index to this layer and querying it directly will result in a standard vector database search.
• Third layer (optional): Entity layer - This layer consists of entities extracted from each document chunk (e.g., using spaCy). Since these entities often recur in documents, they act as a “glue” layer, thus enhancing search results.

In both cases, I demonstrated a way to create graphs without relying on LLMs. However, a major challenge with this approach is building the ontology layer. Consider the following facts:

• Not all data sets fall into well-defined domains.
• Subject matter experts (SMEs) cannot always help build an ontology.

Because of these limitations, I began exploring ways to eliminate the need for a fixed ontology layer.

Why use layered graphs?

Neo4j allows vector indexing on a single internal label. If nodes have different labels, you need to build a separate index for each label - which is not always feasible when performing vector searches.

Of course, in some cases it makes sense to have more node types, for example, when strict ontology differentiation/filtering is required. However, in my cases it has not been necessary so far. Usually, a design with two or three layers is reasonable. Therefore, a workaround to the label index limitation is to assign the same internal label to all nodes within a layer, while storing the actual label, name and metadata as node attributes.

The Power of NLP

How do you extract information from text without relying on your own brain or a trillion-parameter LLM? This is where classic NLP (Natural Language Processing) can be a valuable tool.

It’s worth mentioning that when I started searching for the best libraries and NLP models, both before and after the GPT-3.5 era, I was shocked. Many NLP libraries and models have stopped being maintained, which is a shame. It seems like they have been abandoned and almost forgotten, which is a shame because they have great potential.

Nevertheless, driven by real-world industry needs and practical constraints, I decided to take up the challenge of exploring NLP-driven approaches. My goal was to build a graph that would enhance the performance of standard vector databases.

A quick note: Now that this technique has been explored to some extent, I strongly encourage readers to experiment further. Everything I have done so far has only scratched the surface of the full potential of NLP-driven graph architectures.

GraphRAG and its potential applications

Before diving into the implementation of NLP-driven RAG graphs and discussing the results, I would like to first provide my thoughts on different GraphRAG types and their applications.

When I refer to Microsoft GraphRAG, I include not only the original approach published by Microsoft Research in [3], but also the various lighter-weight adaptations that have appeared since then, such as [4-5].

These methods typically involve:

• Extract entities and relations from large text corpora using LLM
• Summarize the extracted information using LLM
• Allow users to query abstracts and/or community-based abstracts

While different implementations exist, the basic principle remains the same: using LLMs to build a knowledge graph from text.

The following diagram (Figure 1) shows my thoughts from an industry perspective on when and why to use different graph-based vector searches for RAG systems.

When deciding whether to use a graph or standard vector database, there are some guidelines on when to choose one over the other [6-7].

This infographic is applicable once you decide to go with the GraphRAG solution. Here I have highlighted the key factors to consider before building your graph.

1. Data volume — How much data exists in your knowledge base?
2. Budget constraints — How tight is your budget for building the graph?
3. Ontology availability :

• Do you have a clear, structured ontology?
• Is your knowledge base in a fixed domain where you can build a strong ontology layer?
• Or is your data diverse, fragmented, and lacking well-defined domain knowledge?

These factors can significantly impact the design, feasibility, and efficiency of your GraphRAG solution.

Once you answer the three key questions — data volume, budget constraints, and ontology availability — you can determine the GraphRAG approach that’s right for your use case.

It is important to note that Figure 1 does not cover all possible scenarios. Some hybrid approaches are possible and the boundaries between techniques are not strictly fixed.

Nonetheless, I've observed the following trend: the more data you have, the more carefully you need to evaluate your investments. If you have a large budget and need very high accuracy, Microsoft's solution is a strong choice.

However, if budget constraints are an issue (which is almost always the case), you may need to sacrifice accuracy and opt for a solution that uses little to no LLM. In this case, the best approach is to build an ontology layer and construct a fixed entity architecture diagram.

If you have difficulty defining an ontology, lack a deep understanding of your data, or face high data complexity, I recommend building an NLP-driven graph. In the following sections, I will demonstrate how to do this.

Unleashing the power of NLP

Next, we'll work on building a graph of the cost of a chocolate bar (taking into account the electricity costs involved).

Technical Setup

For this project I used:

• A business laptop with 32GB RAM and 6GB built-in GPU.
• Neo4j Community Edition running on WSL (Ubuntu) as a Docker container.
• A dataset of 660 PDF files and a data preprocessing pipeline with some modifications taken from NVIDIA RAG Blueprint (https://github.com/NVIDIA-AI-Blueprints/rag/tree/v1.0.0).

NLP-driven graph methods

As mentioned before, the NLP-driven graph is derived from the fixed entity architecture, but with one key difference - I dropped the ontology layer.

This means that the graph will contain:

1. Document layer - contains document blocks, similar to standard vector databases
2. Token layer - extracted tokens act as additional connection nodes, thereby improving search performance

This approach can significantly reduce costs by leveraging NLP instead of the heavy processing of LLM.

Data preprocessing pipeline

The data preprocessing pipeline follows these key steps:

1. Chunking – I used pre-written functions in NVIDIA RAG Blueprint to split the document into smaller segments.
2. Embedding - I used the Hugging Face model "intfloat/e5-base-v2" for the embedding block instead of the default NVIDIA method. This is the only Blueprint preprocessing pipeline modification I mentioned earlier.
3. Graph Construction – Once the data was processed, I constructed the first layer in Neo4j (as shown in Figure 2 below), where all the block nodes were labeled as Document.

Below you will find a code example for populating a Neo4j database with a document layer. The following code demonstrates how to add document chunks to a Neo4j database and establish NEXT and PREV relationships between adjacent chunks.

def  add_chunks_to_db ( chunks, doc_name ):
    prev_node_id =  None
    for  i, chunk  in  enumerate (chunks):
        # Escape single quotes in the chunk content
        escaped_chunk = chunk.replace( "'" ,  "\\'" )         # Create the chunk node
        query =  f'''
        MERGE (d:Document {{
            chunkID: " { f"chunk_ {i} " } ",
            docID: " {doc_name.replace( "'" ,  "\\'" )} ",  
            full_text: ' {escaped_chunk} ', 
            embeddings:  {embeddings.embed_documents(chunk).tolist()} }}
        )
        RETURN elementId(d) as id
        '''
        result = run_query(query)
        chunk_node_id = result[ 0 ][ 'id' ]         # If this is not the first chunk, create a NEXT relationship to the previous chunk
        if  prev_node_id  is  not  None :
            query =  f'''
            MATCH (c1:Document), (c2:Document)
            WHERE elementId(c1) = $prev_node_id AND elementId(c2) = $chunk_node_id
            MERGE (c1)-[:NEXT]->(c2)
            MERGE (c2)-[:PREV]->(c1)
            '''
            run_query(driver, query)
            prev_node_id = chunk_node_id

Notice that I'm building chains of documents here. I'm adding blocks for each document, and these blocks are connected by edges in two directions: one called NEXT, pointing to the next block, and another called PREV, pointing to the previous block. As a result, I have a graph that looks like this (see Figure 2):

Here you can identify 4 of the 660 PDFs I have added to the diagram. The chain starts at chunk_0 and ends at chunk_n.

With the first layer you can easily apply your first vector and text index on it, for example:

query =  '''
CREATE VECTOR INDEX vector_index_document
IF NOT EXISTS
FOR (d:Document)
ON (d.embeddings)
OPTIONS {indexConfig: {
`vector.dimensions`: 768,
`vector.similarity_function`: 'cosine'
}}
'''

And the text index:

query =  '''
CREATE FULLTEXT INDEX text_index_document FOR (n:Document) ON EACH [n.full_text]
'''

Now, one can use this graph as a standard vector database. All you have to do is:

def  pure_rag ( query ):
    my_query_emb = emb.embed_query(query)
    query =  f"""
      CALL db.index.vector.queryNodes('vector_index_document', 10, $user_query_emb)
      YIELD node AS vectorNode, score as vectorScore
      WITH vectorNode, vectorScore
      ORDER BY vectorScore DESC
      RETURN elementId(vectorNode), vectorNode.docID, vectorNode.full_text as document_text, vectorScore
      LIMIT 10
    """
    params = { 'my_query' : my_query,  'user_query_emb' : my_query_emb.tolist()}
    results = run_query(query, params)
    return  pd.DataFrame(data=results)

That’s it! Let’s try it out on the NVIDIA dataset and query a few things based on the data it contains.

The LLM I used is the NVIDIA NIM model “meta/llama-3.3-70b-instruct” from Try NVIDIA NIM APIs (https://build.nvidia.com/explore/discover). Note that I did not design any complex query templates, just passing the user question and the top 10 retrieved passages.

But we built graphs for more than just pure standard vector database functionality, right? Let’s get more out of it!

Unleash the power of images

Graphs add semantic reasoning to data. Even without the classic RDF world semantic reasoning, graphs—by connecting entities—help to gain a deeper understanding of the data. Furthermore, I hypothesized in my previous article that there is always some kind of search asymmetry that can play a role. This search asymmetry is also called magnitude sensitivity. The dot product is affected by the magnitude of the vectors, which means that it may not reliably represent similarity if the compared vectors have significantly different magnitudes [9].

After creating the graph with the document layer, we need a way to create the "glue" for the text blocks. We don't have an ontology, and our assumptions, while simplistic, are a true reflection of reality: we have a lot of data, and we don't know exactly what it's about, but we want to extract the most value from it. Our goal is to build a vocabulary graph that takes advantage of all the benefits of GraphRAG without spending too much money in the process.

I recommend leveraging NLP techniques for this. First, let’s extract tokens, bigrams, and trigrams from each chunk of text. I used an NLP library called sparkNLP (https://sparknlp.org/models?task=Named+Entity+Recognition) that allows you to harness the power of your local GPU to process large amounts of documents. Below is the code snippet I used for token extraction.

from  pyspark.sql  import  SparkSession
from  sparknlp.base  import  *
from  sparknlp.annotator  import  *
from  sparknlp  import  DocumentAssembler, Finisher
import  sparknlp #Initialize Spark session
spark = sparknlp.start() # Sample data
# Create DataFrame from the list of documents
data = spark.createDataFrame([(i, doc)  for  i, doc  in  enumerate (documents)], [ "id" ,  "text" ]) # Document Assembler
document_assembler = DocumentAssembler() \
    .setInputCol( "text" ) \
    .setOutputCol( "document" ) # Tokenizer
tokenizer = Tokenizer() \
    .setInputCols([ "document" ]) \
    .setOutputCol( "token" ) # NGram Generator for bigrams
bigram_generator = NGramGenerator() \
    .setInputCols([ "token" ]) \
    .setOutputCol( "bigrams" ) \
    .setN( 2 ) #NGram Generator for trigrams
trigram_generator = NGramGenerator() \
    .setInputCols([ "token" ]) \
    .setOutputCol( "trigrams" ) \
    .setN( 3 ) # Finisher to convert annotations to string
finisher = Finisher() \
    .setInputCols([ "bigrams" ,  "trigrams" ]) \
    .setOutputCols([ "finished_bigrams" ,  "finished_trigrams" ]) \
    .setCleanAnnotations( False ) # Pipeline
pipeline = Pipeline(stages=[
    document_assembler,
    tokenizer,
    bigram_generator,
    trigram_generator,
    finisher
]) # Fit and transform the data
model = pipeline.fit(data)
result = model.transform(data) # Show the results
pandas_df = result.select( "text" ,  "finished_bigrams" ,  "finished_trigrams" ).toPandas()
# Stop the Spark session
spark.stop()

After creating the token entities, you can add them to the graph, establishing connections with the blocks from which they were extracted. This approach is simple and robust, and you can again apply two indexes on this layer, as I demonstrated before. In this way, we create a second layer where all nodes are labeled "Token". I included the labels "token", "bigram", and "trigram" in the label attribute, as well as the tokens themselves as a name attribute and associated embeddings. The following example shows the Cypher query used to create the token nodes, and the query used to build the corresponding vector index:

# create token node
query =  """MERGE (t:Token {label: "Token", 
                                        name: $token,
                                        embeddings: $token_embeddings
                                        }) RETURN elementId(t) as token_node_id"""

Do this for bigrams and trigrams as well.

Next, create the index:

# create vector index on token embeddings
query =  '''CREATE VECTOR INDEX vector_index_token IF NOT EXISTS
FOR (n:Token)
ON (n.embeddings)
OPTIONS {indexConfig: {
`vector.dimensions`: 768,
`vector.similarity_function`: 'cosine'
}}

An example of a created bigram node is shown in Figure 4. Note that the entire layer containing tokens, bigrams, and trigrams has an internal label “Token”, allowing vector indexing to be applied to all nodes at once.

So far so good: we have some tokens that are partially shared across different documents, which makes everything connected to each other to some extent. Unfortunately but not surprisingly, however, this first RAG attempt did not give any better results than just performing pure RAG.

What we need to unlock the full potential of the graph is to connect entities to each other using context, logic, and semantics. This task is challenging: we need to avoid relying on GPT or other very large-scale language models. Considering that we already have more than 262,000 nodes, using large models will significantly increase the computational cost and is beyond our budget.

Triple

There are many good open source models available. However, triple extraction can be a challenging task. The best approach is to use a smaller Transformer model and fine-tune it for this specific task. It would be more ideal to fine-tune the model yourself, but for this presentation, I used a pre-trained model from Hugging Face. The bew/t5_sentence_to_triplet_xl(https://huggingface.co/bew/t5_sentence_to_triplet_xl) model has been fine-tuned on the XL version of FLAN-t5-xl. This model is about 600 times smaller than GPT-4, so it can easily fit on my computer without any problems. The model is specifically tuned to extract triplets from text. According to the owner of the model, Brian Williams, the model is not yet perfect, and yes, the results are not always as accurate as I would like, but our goal is not the highest accuracy - just getting very good accuracy at minimal cost is enough.

I extracted chunks of text and passed them to the model. The model created many triples (i.e., subject-predicate-object combinations) that were then mapped to token nodes, resulting in a total of over 650,000 edges in the graph.

Here is a snippet of code for triple mapping:

def  process_triplet ( triplet ):
    subject, predicate, object_ = triplet
    subject_emb = embed_query_on_gpu(subject)
    predicate_emb = embed_query_on_gpu(predicate)
    object_emb = embed_query_on_gpu(object_)
    params = { 'subject_emb' : subject_emb.tolist(), 
              'predicate_emb' : predicate_emb.tolist(), 
              'object_emb' : object_emb.tolist(),
              'subject' : subject,
              'predicate' : predicate,
              'object' : object_} similarSubjects_query =  """
    CALL () {
        // Search for the subject duplicates
        CALL db.index.vector.queryNodes('vector_index_token', 10, $subject_emb)
        YIELD node AS vectorNode, score as vectorScore
        WITH vectorNode, vectorScore
        WHERE vectorScore >= 0.96
        RETURN collect(vectorNode) AS similarSubjects
    }
    WITH similarSubjects
    OPTIONAL MATCH (n:Token {name: toLower($subject)})
    WITH similarSubjects + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allSubjects
    UNWIND allSubjects AS subject
    RETURN collect(subject) AS similarSubjects
    """
    similarSubjects = run_query(similarSubjects_query, params)[ 0 ][ 'similarSubjects' ] similarPredicates_query =  """
    CALL () {
        // Search for the predicate duplicates
        CALL db.index.vector.queryNodes('vector_index_token', 10, $predicate_emb)
        YIELD node AS vectorNode, score as vectorScore
        WITH vectorNode, vectorScore
        WHERE vectorScore >= 0.96
        RETURN collect(vectorNode) AS similarPredicates
    }
    WITH similarPredicates
    OPTIONAL MATCH (n:Token {name: toLower($predicate)})
    WITH similarPredicates + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allPredicates
    UNWIND allPredicates AS predicate
    RETURN collect(predicate) AS similarPredicates
    """
    similarPredicates = run_query(similarPredicates_query, params)[ 0 ][ 'similarPredicates' ] similarObjects_query =  """
    CALL () {
        // Search for the object duplicates
        CALL db.index.vector.queryNodes('vector_index_token', 10, $object_emb)
        YIELD node AS vectorNode, score as vectorScore
        WITH vectorNode, vectorScore
        WHERE vectorScore >= 0.96
        RETURN collect(vectorNode) AS similarObjects
    }
    WITH similarObjects
    OPTIONAL MATCH (n:Token {name: toLower($object)})
    WITH similarObjects + CASE WHEN n IS NULL THEN [] ELSE [n] END AS allObjects
    UNWIND allObjects AS object
    RETURN collect(object) AS similarObjects
    """
    similarObjects = run_query(similarObjects_query, params)[ 0 ][ 'similarObjects' ] query =  """
    UNWIND $similarSubjects AS subject
    UNWIND $similarPredicates AS predicate
    UNWIND $similarObjects AS object
    WITH subject.name AS subjectName, predicate.name AS predicateName, object.name AS objectName, subject, predicate, object
    MERGE (subjectNode:Token {name: toLower(subjectName)})
    ON CREATE SET subjectNode.embeddings = $subject_emb, subjectNode.triplet_part = 'subject'
    ON MATCH SET subjectNode.triplet_part = 'subject'
    //MERGE (predicateNode:Token {name: toLower(predicateName)})
    //ON CREATE SET predicateNode.embeddings = $predicate_emb, predicateNode.triplet_part = 'predicate'
    //ON MATCH SET predicateNode.triplet_part = 'predicate'
    MERGE (objectNode:Token {name: toLower(objectName)})
    ON CREATE SET objectNode.embeddings = $object_emb, objectNode.triplet_part = 'object'
    ON MATCH SET objectNode.triplet_part = 'object'
    MERGE (subjectNode)-[r:predicate {name: toLower(predicateName)}]->(objectNode)
        ON CREATE SET r.label = 'triplet', r.embeddings = $predicate_emb
        ON MATCH SET r.label = 'triplet' RETURN subjectName AS subject, predicateName AS predicate, objectName AS object
    """
    final_params = {
        'similarSubjects' : similarSubjects,
        'similarPredicates' : similarPredicates,
        'similarObjects' : similarObjects,
        'subject_emb' : subject_emb.tolist(),
        'predicate_emb' : predicate_emb.tolist(),
        'object_emb' : object_emb.tolist()
    }
    results = run_query(query, final_params)     print ( f"Processed triplet:  {triplet} " )
    return  results

NLP-enabled GraphRAG

Figure 7 shows the results of the hybrid RAG/GraphRAG approach on the same question using only document-level retrieval, representing pure RAG (Figure 3). The answers are more comprehensive and in-depth.

Note that I did not perform any entity resolution or entity linking, which would definitely be the next step and would most likely improve performance. Also, for both retrieval tests, I passed 10 retrieved text paragraphs. GraphRAG took almost twice as long as RAG. While we sacrificed some latency, we gained better answer accuracy.

The retrieval function using triple relations is given below.

def  triplets_driven_retrieval ( my_query ):
    my_query_emb = emb.embed_query(my_query) query =  """
        CALL db.index.vector.queryNodes('vector_index_token', 300, $user_query_emb)
        YIELD node AS token, score AS tokenScore
        CALL (token, tokenScore) {
            MATCH (token)
            WHERE token.triplet_part IS NOT NULL
            OPTIONAL MATCH (token)-[:predicate]->(object)
            OPTIONAL MATCH (object)-[:predicate]->(subject)
            OPTIONAL MATCH (subject)-[:CONTAINS]->(doc:Document)
            RETURN DISTINCT doc, tokenScore as score, 1 AS isTripletPath
            ORDER BY tokenScore DESC
            LIMIT 200 UNION
            MATCH (token)
            WHERE token.triplet_part IS NULL
            MATCH (token)-[:CONTAINS]-(doc:Document)
            RETURN DISTINCT doc, tokenScore as score, 2 AS isTripletPath
            ORDER BY tokenScore DESC
            LIMIT 200
        }
        RETURN DISTINCT doc.full_text AS document_text, score, isTripletPath
        ORDER BY score DESC
        LIMIT 100 UNION CALL () {
        CALL db.index.vector.queryNodes('vector_index_document', 10, $user_query_emb)
        YIELD node AS doc, score as vectorScore
        WITH doc, vectorScore
        ORDER BY vectorScore DESC
        RETURN DISTINCT doc, 
        vectorScore AS score, 3 AS isTripletPath
        ORDER BY vectorScore DESC
        LIMIT 10
        } RETURN DISTINCT doc.full_text AS document_text, score, isTripletPath  
        ORDER BY score DESC
        LIMIT 10
        """     params = { 'user_query_emb' : my_query_emb.tolist()}
    results = run_query(query, params)
    df = pd.DataFrame(data=results)     return  df

You can traverse the graph structure in the best way by optimizing the query logic. Let's take a look at what the GraphRAG Cypher query introduced above is doing. The query is built in several steps. First, we match the user query on the token node using the vector index. We check if the token has an attribute called triplet_part (these attributes are tokens mapped from the generated triples). When we traverse the triples and reach the subject node, we get all the object nodes pointing to it and select all the document blocks attached to these nodes, sort them and limit the search. If the token has no triplet pair, just traverse directly to its corresponding document block. In the second part of the query, we perform a standard RAG search and select the document using the vector index.

There is still room for further optimization of this query. As a side note, I also used spaCy’s named entity extraction to extract token classification labels like ORG, DATE, etc. (see the red nodes in the title image). However, the results were not very good, so I stuck with the two-tier architecture.

It is interesting to see what the subgraph of the Cypher query looks like for a simple user question, “What companies are mentioned in the system?” (Figure 8). The subgraph shows the structure of the subject, object, and their associations built based on the user question.

The results consist of two distinct parts: a mostly disjoint chunk of text retrieved from the standard RAG part and a set of nodes with a main "subject" triplet node, in this case: "company". This representation can significantly help the optimization phase of retrieval queries.

discuss

The main goal of this approach is to showcase the application of Hybrid Graph RAG, which provides the possibility to create standard RAG functional graphs and enhance them with the “power of the graph”, i.e. use the semantics of its content, traverse entity relationships and retrieve information in various ways you define. The literature shows that each technique, RAG and GraphRAG, performs better on certain tasks [6-7]. This application is mainly intended to showcase Hybrid Graph RAG, combining classic RAG with GraphRAG, but can also be used as a standard RAG approach.

As an idea, the agent could later filter out specialized questions asking specific facts, and could do classic RAG queries, bypassing the first Cypher part shown above. Questions that require multi-hop reasoning or some overarching context could be redirected to the GraphRAG world. All of these are possible, but not mandatory; you can always choose one over the other.

Most importantly, the NLP-driven architecture presented above gives you the flexibility to choose your RAG approach and opens up new horizons for RAG solutions.

in conclusion

In summary, this paper introduces an NLP-driven approach to building knowledge graphs that performs hybrid RAG/GraphRAG for RAG applications without over-reliance on LLMs. The approach involves hierarchical graphs without the need to include fixed ontologies.

Preliminary results show that questions answered using this hybrid retrieval produce more comprehensive and insightful answers, laying the foundation for subsequent exploration and potential application in large-scale GenAI projects.