From RAG to CoT to MCP, an article to understand the difficulties of AI Agent implementation | Large Model Research

Written by
Jasper Cole
Updated on:June-24th-2025
Recommendation

Explore the difficulties in implementing AI agents and gain insight into the future direction of large model technology.

Core content:
1. The autonomy, interactivity and adaptability of AI agents
2. The challenges and breakthroughs of RAG technology in the application of AI agents
3. The core challenges and future development prospects of AI agent technology

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

The smart phone is still usable now!

"

Generative AI has been changing our digital world at an unprecedented speed. From ChatGPT to Midjourney, from Claude to Gemini, these large language models have demonstrated amazing capabilities. However, when we try to transform these models from simple conversation tools to intelligent agents that can make decisions and perform tasks autonomously, a series of complex technical problems begin to emerge.

Reality is always far from ideal. You think that you can quickly customize a useful intelligent service based on a large model by injecting your own data. But in fact, after adding new data, no matter how technical experts handle it, it is difficult to automatically aggregate and classify it, let alone form usable parameters. It may even cause the original large model to crash and become unusable.

RAG search enhancement generation and AgentQ technology are both very useful and have solved many difficult problems, but they only guarantee that users can be given answers. Ultimately, improving quality is another challenge.

Anthropic’s MCP is very popular, and Google’s A2A has also released a dazzling array of delivery solutions, but these only refine the division of labor in the industry. If what is delivered in the end is still a mess, customers will find out one day.

How to solve these technical problems?

In 2025, which is known as the "first year of AI agents", both technology giants and startups are striving for the implementation of AI agents. According to Gartner's forecast, by 2028, the proportion of enterprise software integrating autonomous AI will jump from less than 1% in 2024 to 33%; at the same time, more than 15% of daily work decisions will be made autonomously by AI agents.

When AI Epiphany Emerges, we will deeply explore the core challenges of GenAI large-model AI Agent intelligent body technology, from RAG retrieval enhancement generation, vector database, embedding technology, Post Training to CoT thinking chain and other technologies, analyze the convergence, generalization, discretization, clustering and other key problems encountered during its deployment and training, and look forward to the technological breakthroughs required for future development.



▍Part 1: Technical foundation and difficulties of GenAI large model intelligent agent

1.1 From Large Models to Intelligent Agents: Concepts and Architecture

An AI agent is an intelligent entity that can perceive the environment, plan autonomously, make decisions, and perform actions to achieve its goals. The main difference from traditional AI systems or large models is that an AI agent has characteristics such as autonomy, interactivity, responsiveness, and adaptability.

In terms of basic architecture, modern AI agents are usually based on large language models (LLMs), which build a system with three core components:

  1. Model layer : The large language model serves as the intelligent core, providing understanding, reasoning, and generation capabilities.

  2. Tool layer : various APIs and functional modules to enhance the ability of intelligent agents to interact with the outside world

  3. Coordination layer : responsible for organizing the reasoning process, planning decisions and guiding the execution of actions

This architecture enables the agent to decompose complex tasks into subtasks and complete tasks that cannot be directly instructed by humans through tool use and external resource calls.

1.2 RAG retrieval enhancement generation technology and its challenges

RAG (Retrieval-Augmented Generation) is the mainstream technology for current large model applications. It greatly improves the accuracy and timeliness of large model answers by retrieving relevant information from external knowledge bases and then generating based on this information.

Core challenges facing RAG technology:

  1. Information loss in data vectorization

    In order to achieve efficient retrieval, text data needs to be converted into vectors, a process that inevitably causes information loss. Current embedding models (such as OpenAI's text-embedding-ada-002, etc.) show obvious limitations when dealing with professional vocabulary or multilingual content.

  2. The semantic search accuracy problem

    The key to the RAG system is to locate the most relevant content based on the user's question. When there is a difference between the user's question and the expression in the knowledge base, the retrieval based on vector similarity often fails. For example, the user asks "how to improve running speed", and the relevant document in the knowledge base may be titled "methods to enhance sprinting explosiveness".

  3. Difficulty in searching for proper nouns

    It is difficult for internal knowledge terms to retain their uniqueness during the vectorization process, which affects the accuracy of the generated vectors and the effect of large model output.

  4. Contextual understanding and information synthesis

    The RAG system also needs to correctly understand the context of the retrieved content and its relevance to the user's questions, which requires the model to have strong contextual understanding and information comprehensive analysis capabilities.

1.3 Technical Difficulties of Vector Database

The vector database is an important part of the RAG system, which is used to store and retrieve vector representations of text or other data.

The main technical challenges include:

  1. The "curse of dimensionality" of high-dimensional data

    As the vector dimension increases, the distance differences between data points become blurred and the retrieval accuracy decreases. Most vector embedding dimensions are between 768-1536, which poses a huge challenge to efficient indexing and retrieval.

  2. Balance between indexing and retrieval efficiency

    Vector databases need to strike a balance between the spatial complexity of indexing and the time complexity of retrieval. Currently, mainstream approximate nearest neighbor (ANN) algorithms such as HNSW and FAISS have limitations in specific scenarios.

  3. Thickness selection problem

    Vector databases face the dilemma of choosing between "thick storage" and "thin storage". Thick storage stores a large amount of raw data, providing richer context but increasing storage costs; thin storage only stores necessary information, reducing storage space but potentially losing context.

  4. Multimodal data processing

    Processing vector representations of multimodal data such as images, audio, and video and achieving cross-modal retrieval are major challenges facing current vector databases.

1.4 Bottlenecks of Embedded Technology

Embedding technology is the process of converting natural language, images or other data into high-dimensional numerical vectors, and is the key link in combining large models with RAG systems.

The main problems facing current embedded technology:

  1. Semantic Preservation and Model Selection

    Different embedding models perform differently on different tasks. How to select an embedding model suitable for a specific field and retain the most important semantic information is the primary challenge.

  2. Embedding vector dimension selection

    The higher the vector dimension, the stronger the expressive power, but the higher the computational and storage costs; too low a dimension may lead to information loss. In practical applications, it is necessary to make a trade-off based on specific needs and resource constraints.

  3. Technical Difficulties of Embedding Process

    The training and optimization of embedding models require a large amount of high-quality data and computing power support, and different types of data (such as long texts, short sentences, and professional terms) have different requirements for embedding quality.

1.5 Post Training and CoT Thinking Chain Technical Challenges

Post Training and CoT (chain of thought) are key technologies for improving the reasoning ability and adaptability of large models.

Main technical difficulties:

  1. Post-training sample construction

    Constructing high-quality post-training samples is a challenge. Fine-tuning samples requires finding positive samples that are similar to the query and negative samples that are dissimilar, a process that is both time-consuming and requires expertise.

  2. The problem of convergence of thought chain

    CoT technology may face convergence difficulties when dealing with complex reasoning tasks, especially in problem decomposition and multi-step reasoning. How to ensure the correctness of each step and ultimately draw accurate conclusions is a key challenge.

  3. The balance between reasoning and generalization

    Enhancing the model's reasoning ability in specific fields may lead to a decrease in the model's generalization ability in other fields. How to maintain the overall generalization of the model during the Post Training process is a difficult problem.

  4. The balance between reasoning depth and response speed

    CoT requires the model to perform multi-step reasoning, which increases the reasoning depth of the model but also prolongs the response time, which may cause a bad experience in real-time interaction scenarios.



▍Part 2: Practical challenges and thresholds for AI Agent implementation

2.1 Technology landing threshold

Complex architecture and integration challenges

The construction of AI Agent system requires the integration of multiple technical components, including large models, RAG systems, vector databases, tool calls, etc. The coordination and integration between these components is a challenge. According to a report by the Institute of Automation of the Chinese Academy of Sciences, companies that have successfully deployed AI Agent systems usually invest a lot of resources in the selection and integration of technical components.

System reliability and stability

AI Agents need to maintain reliable performance in complex and dynamic environments. According to 36Kr, most companies are still in the exploratory stage of large-scale model applications, and system stability is one of the main concerns.

The balance between domain knowledge and general capabilities

AI Agents need to master specific industry domain knowledge while maintaining certain general capabilities. Even if the original large model has the ability to "emerge", if the model lacks specific industry data, its understanding of the industry will still have obvious limitations.

2.2 Talent and capability threshold

Scarcity of interdisciplinary talents

The development and deployment of efficient AI Agents requires compound talents with expertise in machine learning, software engineering, product design, and specific fields. According to the research report on technical positions and ability training for large artificial intelligence models, large model professionals need to master deep learning theory, programming skills, algorithm design, and domain knowledge.

Combining technical capabilities with business understanding

The AI ​​Agent development team needs to understand both technology and business needs. Large model technology tests the full-stack R&D capabilities, including data management, computing infrastructure engineering, and underlying system optimization.

Continuous learning and tuning capabilities

AI Agent technology is developing rapidly, and the technical team needs to continue learning and tuning. The AWS case shows that from the initial exploratory project to mature application, the technical team needs to continue trial and error and advance, and continuously optimize RAG, Workflow and Agent capabilities.

2.3 Data Quality and Limitations

Scarcity of high-quality data

High-quality, industry-specific training and fine-tuning datasets are key to improving the capabilities of AI agents.

Data Bias and Representativeness

Biases in training data can cause AI agents to make biased decisions. This is particularly important in sensitive fields such as finance and healthcare. Addressing these biases requires building more balanced and diverse training data sets.

Data Privacy and Security

The data processed and stored by AI Agents may involve privacy-sensitive information. Research on large model security challenges and attack testing shows that core security threats such as prompt injection and data leakage can be identified through offensive testing of LLM.

2.4 Computing cost and resource limitations

High costs of training and deployment

The training and deployment of large-scale AI Agent systems require a large amount of computing resources. According to the IEI report, the development of large models has entered the "million-card era", which brings high costs in technology, operations, and manpower.

Uneven distribution of computing resources

High-performance computing resources are unevenly distributed around the world. According to Sun Ninghui from the Institute of Computing Technology of the Chinese Academy of Sciences, the large-scale promotion of artificial intelligence technology must solve the problem of long-tail applications and provide low-cost computing power and low-threshold services to 80% of small and medium-sized enterprises.

Ongoing operating costs

In addition to the initial training costs, the continued operation of AI Agents also requires a lot of computing resources. The large model industry faces problems such as computing power bottlenecks and mainstream architecture limitations, which may have a certain impact on the industry's growth rate.