Stop believing in RAG, especially when it comes to building autonomous programming

Written by

Iris Vance

Updated on:June-19th-2025

Stop believing in RAG, especially when it comes to building autonomous programming

In the early days of large language models (LLMs), RAG (Retrieval-Augmented Generation) was a lifesaver. At that time, the context window of large models was not that large, and RAG helped us simulate memory. I was even an early supporter of RAG, building a popular open source RAG system and helping those high-growth AI companies optimize their retrieval systems.

But for now, I would say that especially when it comes to building agents that can program themselves like experienced engineers, you should stay away from RAG.

Why? Because quality matters.

If you are looking for low cost and trying to reduce the usage of tokens to make profits, then RAG may still come in handy. Cut the code base into pieces, embed it, calculate the cosine similarity, and perhaps re-rank it to pick out the most matching fragments - this is what companies like Cursor and Windsurf do, and they have indeed made a business.

But if you are pursuing top-level quality and want to build an intelligent agent that can truly understand and write complex code, then RAG is a black hole that will devour your resources, time, and seriously drag down the model's reasoning ability.

Aman Sanger (co-founder of Cursor) said: "The hardest questions and queries in a codebase often require several steps of reasoning. Traditional search can only handle one step."

If you are building a programmatic agent, think twice about RAG

I have helped many fast-growing AI companies optimize their RAG processes and achieved substantial revenue growth. I have also written many articles sharing the secrets. But even so, I still have to tell you: RAG is a bottomless pit. You can invest endless resources, time, and talent, and the result may be only marginal improvements. If your product really cannot do without RAG, it may be worth it.

But for autonomously programmed agents, RAG is a huge distraction , not only to your team, but also to the agent itself.

Quinn Slack (Sourcegraph CEO) also pointed out: "Most people in the industry have never really done RAG well... 99% of RAG practices have not even been properly sliced and diced, or evaluated in a rigorous way... And now, we are long past the stage where RAG is enough."

There is something strangely deceptive about the RAG narrative, like a “thought virus”. It’s as if people got RAG into their heads in 2022 and then tried to shoehorn it into everything, expecting it to make things better. Quite the contrary, it’s a huge risk that will mislead your models and dilute their judgment with a bunch of disorganized code snippets.

I was in a meeting with a potential customer at a large enterprise. They were evaluating different programming agents to see which would work best for their team of engineers. This was a serious company with thousands of engineers, but their decision process was surprisingly naive. They seemed to just want a list of features, and RAG was one they were very persistent about. When I asked them why they valued RAG so much, they just said, “I don’t know, isn’t it great for large code bases?”

Think about it, when a senior engineer joins the team and opens a huge code base, what will they do? They will not read a bunch of isolated code snippets. They will browse the folder structure, look at the files, analyze the import relationships, and then go deeper to read more files.

This is the mindset your programming agent needs to replicate, not a mind map of schizophrenic code snippets clustered by high-dimensional vectors.

A good programming agent should explore the code base as naturally as a human

Cline is one of the most expensive programming agents on the market. But it reads code like a real person: it traverses the folder tree, analyzes import relationships, parses the abstract syntax tree (AST), and uses this to determine where to go next. Cline relies on a lot of context, it is active (Agentic), and it is expensive. But this is the price of intelligence. Retrieval is passive; autonomy is active.

Ty Dunn (founder of Continue) also mentioned: "It's very difficult to build a system that can automatically determine which contexts are relevant on the fly... and RAGs may require a large number of integrations that you have to maintain forever."

With the advent of Claude Sonnet 3.5 and even newer models, the size of the context window and the ability to reason are no longer the bottlenecks. The quality of the context is.

So at Cline, we give the agent all the tools a real engineer has, and then let it explore freely.

“Forget RAG. New architectures like MCP enable more autonomous, context-aware, and efficient workflows,” Shridhar Belagatti wrote of Cline’s autonomous memory system.

RAG is not dead, but it is misplaced—at least programmatically

RAG can still shine in some areas - such as customer service robots, document question answering, general knowledge memory, etc. But if you are serious about building a high-performance autonomous programming agent, I suggest you take a step back and look at the problem from a completely different perspective.

Rather than building systems that you need to maintain forever and that confuse your agent’s judgment, start by removing as much as possible . Remove all unnecessary scaffolding that gets in the way of those flagship big models.

Build memories , build tools . Give your agents the same amenities that real engineers would use. Then, make them work like humans.

We have finally reached the stage where the models are capable of doing this, so let’s just let them do it.

Original link: https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for

If you want the big model to write high-quality short stories, and are interested in other content, you are also welcome to click the link below. The effect is good and many friends say it is good.

Click here: Super Writing Prompts and the Strongest Writing Guidance

The effect is as follows

AI Writing and Correction Assistant

Snowflake Writing

prompt automatic optimization

Learn 4 AI Agent Design Patterns in One Article