Knowledge graph is difficult to implement, the best way is simple, the 80/20 rule - RAG+Agent

Written by

Clara Bennett

Updated on:July-09th-2025

Recently, there have been constant calls for combining knowledge graphs. GraphRaG, represented by Microsoft, is extremely difficult to implement. Why doesn't anyone use it? A lot of communities and summary communities have been built to solve the problems of summary and entity expansion, but the cost is too high and the effect is very small, so everyone has abandoned it. In return, they get pure keyword composition, and that's it.

As for KAG, the KG set was also brought in, that is, to do logical reasoning, but this is a long-tail phenomenon [private domain is another matter], so it was also thrown aside.
Speaking of pikeRAG, the work is very comprehensive and covers all aspects, but there are too many links, too many details, and even grading. Although it is better than the previous vanilla rag and Graphrag in some test lists, it should be noted that the evaluation data set used is actually far from the actual implementation problems, so it is purely academic. I dare not use it in practice.
The significance of these GraphRAG frameworks is actually to contribute ideas rather than to contribute implementation solutions. These are actually two different things. Everyone should be clear about this.

So how should we correctly view RAG? RAG should be made simpler and simpler, not more bloated and complicated. The ultimate solution is what everyone expects, which is to throw in a super long text llm and ask. But it is not possible at present, and there is a long way to go.

My strategy is preprocess+llm+postprocess. Don't do anything in the middle, because you will be trapped. Downstream, only make supporting components, because llm will change, and adapt and remove it. Preprocessing should be focused on, making things that help llm. After these things are done, they are input into llm, and the effect of llm will be better. In this way, it can be guaranteed that llm will become stronger and stronger [inevitable trend], and the effect will get better and better, which is a positive return. Postprocessing, which can also be called application, will be good after the first two are done, but its risk point is that it is dependent, relatively passive, and will be very tiring. Either make applications, such as the recent deepseek+, to make quick money. Traditional companies, or do data processing, such as Hehe Information, such as Pao Ding. Everyone can also see this differentiation, the head big companies and the top llm companies do llm.

Therefore, the priority is preprocess>postprocess>llm. Often, preprocessing feels more promising because it is hard currency.

The basis is that, first, large companies are not interested because it is too segmented, labor-intensive, uneconomical, and they have their own business. Second, small companies have no business now, and working for large companies to make components is also a way out. One is to do postprocessing, which is not easy for small companies to get into, especially on the C-end. The B-end and GJ-end are better, but the ability to collect payments also needs to be able to bear.