Two years later, has your RAG knowledge base been implemented? An AI IQ tax experiment worth millions

Misunderstandings and reflections on AI technology exploration, the truth behind the implementation of the RAG knowledge base.
Core content:
1. The implementation status and reflections of the RAG knowledge base project
2. The limitations of RAG technology and practical application difficulties
3. Future prospects and optimization suggestions for the RAG knowledge base
Did we use the company's project budget to do so-called technical research for two years?
These two days, I came across a PPT that I drew two years ago when ChatGPT first became popular, and I felt quite emotional.Yes, two years ago, when GenAI first broke out, the words fine-tuning/vector/knowledge base had already entered everyone's field of vision. However, two years have passed, and AI technology has indeed become more and more mature. I would like to ask everyone, has your RAG knowledge base project been built? Is the effect better than the knowledge base based on the previous generation of full-text retrieval technology? Has it been recognized by customers?
After handling optimization consultation for several enterprise-level RAG projects, I began to wonder: Have we collectively fallen into a conceptual technology trap? We assumed that the GenAI era must use matching technology, RAG (Retrieval-Augmented Generation), and directly abandoned the mature knowledge base based on search technology, because: all applications are worth redoing with AI [manual dog head]
However, after two years, how many front-line applications have RAG really been put into practice? How many have achieved the expected results of the project? When we are selecting technologies and implementing projects, we may need to think more calmly. Do we really need a knowledge base based on RAG? Can it really achieve the return on investment expected by users?
The following self-assessment table, everyone takes their own seat:
1. "The answer is clearly written in the document. How many times can your RAG hit it?"
2. "Look up COVID-19/Covid-19/SARS-CoV-2... Can you figure out that they all mean the same thing?"
3. "I'll give you the company's Q1 and Q2 financial reports. Can you calculate the quarter-on-quarter growth rate?"
4. "Compare the performance parameters of the company's product A and product B. Can you make a recommendation to the user?"
5. "How many GPUs did you burn to improve the accuracy by 5%? How many man-days did you spend? What is the maintenance cost?"
Let me put my point of view here clearly:
The RAG knowledge base is just an immature technology laboratory prototype!! Rather than a mature solution suitable for engineering implementation!!
What were our initial expectations for the RAG knowledge base? First, the vertical knowledge base can supplement the knowledge of the large model in specific fields; second, we expect it to "internalize" the content of the knowledge base and conduct deeper reasoning, comparison, and generation. Taking the interpretation of financial reports as an example, we expect that the knowledge we store in RAG is like what people read and understand and remember in their minds, and even further automatically understand the key information in the financial report, such as income, cost, profit, assets, liabilities, cash flow, etc., and discard the many irrelevant interference information.
In fact, you must have discovered to your dismay that it simply can’t do that! !
1. The hit rate is frustrating. The content is clearly in the knowledge base, but it always tells you that it doesn’t exist; 2. Associative thinking does not exist at all. You mentioned Covid19, and I hope it can find all the information related to Covid19 and the new coronavirus, but I can’t do it! 3. You thought that after spending several times more tokens to apply GraphRAG and build a knowledge graph, this problem should be solved, right? Still a dead fish... 4. So you searched the community and someone told you how to optimize document slicing/semantic segmentation. Suddenly you saw that you could use users’ actual common questions to fine-tune the matching to improve the hit rate. You tried all the ways and it can’t be said that there was no effect at all. You can only say that you saw a glimmer of hope. 5. After running for a while, you use the question-and-answer data of common deviation scenarios for users to do fine-tuning optimization. Since you cannot afford the fine-tuning cost of a full-blooded 670b model, you compromise and choose 70b. After stepping on a lot of pitfalls and burning a lot of computing power, it seems that the hit rate of these questions is indeed better, but soon users begin to complain that the answers that they were relatively satisfied with before are now very stupid; 6. Going back to the original expectations, when your question requires association and reasoning across multiple document fragments and multiple entities to get an answer (the so-called multi-hop question answering), or requires complex comparison, analysis, summary or even creative integration of knowledge, you will find that the RAG system is almost completely ineffective.
After all the hard work, it still ends up being a clumsy data finder, not a research assistant who can understand and use the data to think. So what did you end up with?
Now tell me, how is your so-called RAG superior to the traditional mature knowledge base solutions based on full-text retrieval? Is it superior in that it has invested a lot of time, algorithm experts, engineers and hardware resources in scientific research? Or has it made outstanding contributions to solving employment problems? Or is it superior in that when writing a PPT, you said beautifully that we applied the most advanced GenAI technology and RAG knowledge base?
So, after two years of struggling, is it time to reflect on whether RAG is the right choice? Are you doing scientific research or engineering projects? Is your company's technical team using the project budget to conduct public welfare technology research for the society?
What is the truth?
In fact, the complaints are about the narrow sense of RAG, that is, the solution based on vector RAG. The reason for these problems is that we have unrealistic technical illusions about the combination of vector database + large model. Vector database is a data storage method based on semantic matching and vector similarity. Its advantage is that it can understand the closeness of meaning, even if the vocabulary used is completely different (synonyms, near-synonymous expressions, hyponyms, etc.), without the need for string matching like full-text search.
But fundamentally, both full-text retrieval knowledge bases and vector knowledge bases are just document retrieval technologies: they only provide matching information based on conditions. The deep understanding, analysis, and reasoning capabilities that everyone expects are based on large models, not knowledge base capabilities. The semantic matching capabilities provided by vector databases are far from matching the deep understanding and reasoning capabilities that everyone expects.
When you are asked to optimize the RAG effect, you think you only want to optimize the vector database, but in fact you have done a lot of things, all of which are to make up for the capabilities of the big model , a problem that many giant companies and the entire industry have failed to solve. So, can you understand why there is such a big difference between your investment and the results?
What is the more reliable solution at the moment?
In fact, for many scenarios such as enterprise knowledge base Q&A and customer service assistants, users often need to obtain specific information, understand a concept, or solve a specific problem. It is usually sufficient to be able to accurately and concisely extract and present answers from existing knowledge.
However, complex reasoning, multi-document correlation analysis, creative generation, etc. are not universal needs, or are not high-return-on-investment project choices for enterprises at the current stage. For these more common scenarios, I recommend: problem preprocessing based on large models + full-text retrieval knowledge base + case generation module based on large models .
The full-text retrieval knowledge base has nearly 20 years of technological accumulation, and is very mature in terms of both algorithms and engineering solutions, which means lower costs and more certain profit expectations.
1. Question preprocessing based on the big model . This module actually performs semantic decomposition on the user's original request. The ReAct mode can be used to process the original question, such as intent recognition, entity extraction (identification and extraction of key entities, such as product name, model, part number, date, location, name, etc.), and further keyword/constraint extraction to generate multiple groups of search keywords. This process actually simulates the thinking process of people in the search engine era. 2. Full-text retrieval knowledge base . This part can be a private knowledge base within the enterprise, or a general search engine such as Google, Baidu, etc., or an industry vertical knowledge base. For example, I have used PubMed of the U.S. National Library of Medicine to connect to AI doctors, and the effect is very good. 3. The case generation module based on the big model collects all the information returned by the search as context, and then answers the user's questions. It can be further optimized according to the business scenario requirements of the enterprise. 4. You can also perform targeted optimization in the problem preprocessing stage based on historical common problems to intervene in the accuracy of retrieval without doing fine-tuning training, and still achieve better results.
Moreover, in practical applications, solutions based on vector databases are usually more inclined to return exact matches; and full-text retrieval systems were originally designed to find documents, so they usually return entire documents or large paragraphs or pages. Therefore, in terms of actual results, they are even significantly better than solutions based on vector databases (of course, they may also bring more interference information).
Outlook
I am not questioning whether RAG is a wrong direction. It is a correct direction that the entire industry, especially large companies, should explore, but it is definitely not a suitable engineering implementation solution at the moment. In terms of trends, the combination of full-text retrieval and vector database has become a trend, which is called Hybrid Search. To truly achieve the deep understanding, analysis and reasoning capabilities we expect, we may need to rely on the intervention of the Agent model, namely Agentic RAG: task decomposition -> knowledge retrieval -> information integration -> thinking and action planning -> result evaluation and optimization. In fact, DeepRearch, which is currently being launched by various companies in the industry, is such a solution, but do companies really need such a knowledge base?
Welcome colleagues to share your insights in RAG practice in the comment area, whether it is the joy of success, the experience of stepping on the pit, or the thinking about the future. Let us jointly promote this technical route, or the entire field of knowledge intelligence, to true maturity and inclusiveness.