10 thoughts from RAG founder on RAG Agent (Part 1)

Written by

Audrey Miles

Updated on:June-28th-2025

The common way to implement AI applications now is RAG (Retrieval-Augmented Generation). Combining with Agent makes RAG's application scenarios more extensive. Many companies, including myself, are trying to use it to improve the effect of AI implementation in the enterprise. Recently, Douwe Kiela, the founder of RAG, shared his 10 lessons from RAG Agent in the enterprise on LinkedIn, which was also very inspiring to me. Today, I will share his video content and my own practical experience.

1. A better LLM is not the (only) answer:

LLM is only a small part (about 20%) of the entire AI system (especially RAG system, including extraction, retrieval, generation, joint optimization). A good RAG system with an ordinary LLM may perform better than a top LLM with a bad RAG system. The key is to focus on the system rather than the isolated model.

Practice Sharing

I once participated in optimizing a knowledge base question-answering system based on RAG. At that time, the team conducted a test and found that the effect of using GPT-4 was better than that of GPT-3.5, but the accuracy of the answer content was still less than 50%. Later, some adjustments were made and the accuracy was increased to more than 80%:

Before extracting enterprise data into the vector database, format the data, for example, convert it into Markdown format. At the same time, remove some invalid data, such as HTML tags, headers, footers, directories, references, etc.
When searching, use multi-way recall, not only use vector search, but also combine keyword search and use after joint rearrangement. This can improve the recall rate.
When generating the final response content, we use prompt word engineering to limit the scope of knowledge used by the large model. For example, we only generate answers from the recalled content, otherwise the direct answer cannot be responded to, etc., to avoid hallucinations.
Use prompt word engineering such as formatting instructions, Few-Shot, etc. to guide the model to generate better content.

After using the above method, we replaced GPT-3.5 Turbo and the subsequently released GPT-4o mini with a lower cost. Although the model is not the latest and most advanced version, the overall effect is the best. That is what Douwe Kiela said, the overall effect of the RAG system is more important than the effect of a single model.

2. Expertise is your fuel:

The expertise and institutional knowledge accumulated within an enterprise, often in documents and data, is the core fuel that drives the value of AI. This expertise must be unlocked.

Practice Sharing

Every company has rich domain knowledge in its field. There is a project in which a school acts as an AI private teacher to provide each student with a personalized learning plan. The RAG data prepared by this school is their own materials and unique teaching methods, which is also the main difference between them and other peers and their industry barriers. AI robots based on this data will be able to solve problems in this field. It is difficult for general large models to obtain such special domain knowledge.

3. Enterprise scale is your moat:

The core competitiveness of an enterprise lies in its unique data. The real challenge is to use this data on a large scale and enable AI to process large-scale, even "noisy" real data. If this is done successfully, a competitive barrier can be built.

Practice Sharing

In the projects I have experienced, there was indeed a large and complex data cleaning process at the beginning, in order to provide high-quality data for AI and improve the recall rate and quality of RAG. However, as the pilot phase progresses to the promotion phase, you will find that a large amount of enterprise data is "noisy". If data cleaning is required, it will be an impossible task in terms of workload and efficiency. So instead of spending a lot of time on data cleaning, it is more practical to find a way to make AI accept these existing data with "noise".

4. The gap between pilot and production is always larger than expected:

It is relatively easy to set up a small-scale pilot (small number of documents, users, single scenario, low risk), but it is very challenging to scale it to a production environment (massive documents, large number of users, multiple scenarios, high security risks, SLA requirements, etc.).

Practice Sharing

It is still an AI project for the internal knowledge base of an enterprise. A small number of documents, 20 to 50 documents of 50K to 100K in size, are used as the basic data of the knowledge base, with good recall rate and efficiency. However, there are large differences in the amount of data and the size of single files in actual enterprises. After the rollout, the recall rate and accuracy of RAG are not considered. The speed of indexing and recall has become a problem. Therefore, in the pilot stage, it is necessary to consider the subsequent large-scale system design and response plan, rather than just focusing on the current volume. But how much consideration needs to be balanced based on the actual project.

5. Speed is more important than perfection:

Don’t aim for perfection from the start. Give the system (even if it’s not perfect) to real users as early as possible to get feedback and iterate quickly. Reach your goal by “climbing the mountain” through iterations, rather than trying to design a perfect solution all at once.

Practice Sharing

This seems to conflict with point 4. How to strike a balance between speed and perfection? Let's share a situation in a real project:

Find the problem that the enterprise needs AI to solve the most, preferably a single problem, to reduce complexity. Implement it quickly so that customers can see the effect. Gain recognition from the enterprise. At this time, the amount of data does not need to be large, and the main focus is whether the enterprise can understand the value of AI.
Designing the system according to the requirements of the production environment is still a single problem of the pilot. After scaling up, see if it can meet the requirements of the enterprise. For example, in a project where we cooperated with a primary school to develop an AI private tutor, we initially covered three of the subjects that the customer wanted , mathematics, Chinese, and natural science. However, the data provided was limited to 5M, and there were a total of 20 students and teachers participating in the acceptance experience. The feasibility . This process took 1 month. After getting a positive effect, the amount of textbook data was expanded from 5M to 200M , and the number of participating teachers and students was expanded to 300. This process took 2 months. During this period, there were multiple UATs to collect feedback. After all these were passed, further evolution began, such as model optimization, data optimization, and system throughput optimization . Therefore, maintaining rapid delivery and user participation is the key to the implementation of AI projects.

10 thoughts from RAG founder on RAG Agent (Part 1)

1. A better LLM is not the (only) answer:

Practice Sharing

2. Expertise is your fuel:

Practice Sharing

3. Enterprise scale is your moat:

Practice Sharing

4. The gap between pilot and production is always larger than expected:

Practice Sharing

5. Speed ​​is more important than perfection:

Practice Sharing

5. Speed is more important than perfection: