AutoDev Pre-context Engine: Pre-generates code semantic information and builds the knowledge base for AI programming

AutoDev pre-context engine opens a new era of AI programming and significantly improves code retrieval efficiency.
Core content:
1. Pre-generated context technology: offline construction of code semantic data to accelerate code agent response
2. AutoDev Context Worker tool: one-click generation of code repository context to improve RAG performance
3. Application and advantages of pre-generated context in fixed knowledge scenarios such as internal frameworks and SDKs
Pre-generated context means that before the user initiates a query or generates a request, the system builds a set of semantic context data offline for a specific code repository, document or SDK. These contexts are understood, processed and organized so that they can be quickly retrieved and referenced at runtime, thereby improving the accuracy, relevance and response speed of the code agent when generating, interpreting or retrieving code.
We have implemented the analysis side and backend of this concept in AutoDev Workbench. The corresponding analysis side is AutoDev Context Worker. You can use this tool to generate context for your code repository:
npx @autodev / context - worker@latest
PS: npx
It is a package management tool for Node.js that can directly run commands in npm packages.
Introduction: Finding a more efficient RAG method
RAG is an important topic in AI applications. Although we have built a variety of vectorized RAG methods in the VSCode version of AutoDev, I have always believed that in AI programming, the cost-effectiveness of vectorized RAG is not high for a large number of projects.
Therefore, it is necessary to re-examine the RAG approach, especially for code retrieval.
Cost-effective vectorized code retrieval
I think vectorized code retrieval is not cost-effective for the following reasons:
Vectorized indexing is an expensive process, whether you use local vectorization or cloud vectorization. Vectorized real-time refresh is another headache, especially since it will strain your local machine. There is not a lot of documentation knowledge in the code base, but more semantic information of the code itself.
From the perspective of technology trends in 2025, vectorized RAG has become a secondary option. Only when AI cannot retrieve relevant information at the current stage, some tools such as Cursor will use vectorized methods for retrieval.
Pre-generation of fixed knowledge such as internal frameworks
On the other hand, as an AI programming tool, we also need to face the pre-generation of a large amount of fixed knowledge such as internal frameworks, SDKs, APIs, etc. Because this knowledge is relatively fixed, for users, the questions are usually quite clear - users will not ask: How to use Spring Boot to build a Web application? Instead, they will ask: How to use the xx framework in Spring Boot.
Then, for this type of scenario where fixed knowledge is required:
Internal development framework. It can be divided into problems with a single component, or multiple components used together, etc. Some code information is needed when troubleshooting. SDK, API, etc. Users usually do not want to understand the implementation details of SDK and API, but want to understand how to use them to complete specific tasks. Other public code libraries. For example, how to use a public library to complete a specific task.
We can generate them in a more efficient way.
Context Worker: Pre-generated code context
As mentioned above, for this type of scenario, we can use pre-generated context to improve the effect of RAG. Context Worker is designed for this purpose.
AutoDev Context Worker is a tool for deep parsing and analyzing code, designed to provide developers with better contextual understanding and intelligent code processing capabilities. It can help developers understand and use code bases more efficiently.
Design and goals of Context Workers
Context Worker is developed based on our previous VSCode version. We extracted the core code parsing and analysis functions from it and built an independent tool. We further expanded its multi-language parsing support and now support more than a dozen mainstream languages such as Java, JavaScript, TypeScript, Python, Golang, Rust, C/C++, Ruby, C#, etc.
Combined with the server-side functionality of AutoDev Workbench, Context Worker can provide developers with the following capabilities:
Deep project analysis and AST construction are structured. Context Worker performs deep analysis on the entire project (or specified module scope). This includes building a complete AST, identifying all functions, classes, interfaces and their signatures, and comments (docstrings). At the same time, it analyzes project dependencies (internal modules and external library dependencies) and builds a preliminary dependency graph. Automatic code summary and "intention" annotation: For code blocks (functions, complex logic sections) that lack good annotations, try to use LLM to pre-generate concise summaries or "intention descriptions". For some key architectural components or core algorithms, specific tags or metadata can be pre-marked. Build a project-level knowledge graph: parse the code entities (classes, functions, variables, etc.) and their relationships (calls, inheritance, implementations, references, etc.), and build a knowledge graph around the domain model, annotating the semantics and contextual information of the entities.
Using AutoDev Context Worker
Using Context Worker is very simple. You just need to run the following command:
It will start the Context Worker and parse and analyze the code in the current directory. The following is an example of generating data:
Context Worker automatically analyzes the interface, implementation class, method and other information in the code and generates the corresponding contextual knowledge. You can store this information in the knowledge base and then call AI to directly generate the corresponding name and description for use in different scenarios.
Using MCP to obtain contextual knowledge
Combined with the MCP (Model Context Protocol) service we also provide in AutoDev Workbench, you can use AI programming tools to obtain the contextual knowledge required for known problems through MCP.