With MCP, do we still need to study Agent in depth?

Written by
Iris Vance
Updated on:July-01st-2025
Recommendation

Explore the differences and connections between the MCP protocol and Agent technology.

Core content:
1. The working principle and limitations of the MCP protocol
2. The advantages of Agent technology in understanding user intentions and task planning
3. Possible future protocol architectures and application prospects of Agent technology

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

As a universal protocol for connecting large models and tools, MCP gives users the opportunity to connect the large models they are currently using with excellent software from all over the world, so that they can complete intelligent task processing (calling tools to achieve specific purposes during the process) in chat applications such as ChatGPT.

When we connect the tool system in the chat interface, we may have a question: This is almost the same as the intelligent agents we are exposed to. Is it necessary for us to study Agent technology in depth?

Today, I would like to talk about this issue and, based on my own actual implementation scenarios, talk about a possible protocol architecture for the future.

Unresolved issues with MCP

Through my previous article, we can roughly understand the working principle of MCP. Based on this, we know that MCP solves a problem similar to Function Calling, that is, to allow the big model to return metadata used to call the tool during the conversation with the user, so that the tool can be called at the chat application level. However, this call process actually only solves the problem of the big model calling the tool. MCP does not subvert the existing program interaction mode, but only standardizes the original fact plan so that any big model application can call the tool according to this standard. However, compared with Agent, MCP still has many problems that have not been considered and solved.

Understand the user’s true intention

Although the intelligence of large models is getting higher and higher over time, due to the exhaustion of global data, the speed of this intelligence improvement is getting slower and slower. Even if the intelligence is getting higher and higher, it is still impossible to understand the true intention of user input ideally. The reasons include but are not limited to the following:

  • The randomness of human language: the same sentence has different meanings in different contexts
  • The illusion of a large model will automatically associate the user's intention, which may deviate from the user's true intention.
  • The user's true intention cannot be fully expressed in words
  • User expression error

How does Agent solve this problem? It mainly involves the following contents:

  • Memory: long-term memory is formed between the agent and a single user, so that the agent can judge the user's current demand context and improve the ability to understand the user's intention.
  • Multiple rounds of communication: Agent will communicate with users multiple times to allow users to add details of their needs
  • Verification: Some agents will provide solutions similar to "test cases" to verify their user needs through test cases.
  • Dynamic adjustment: When the Agent is executing a task, the user can make adjustments based on the stage results to avoid excessive deviations.

It can be seen that Agent, as an upper-level application, does not rely entirely on large models to solve problems, but uses clever architectural design to enable the program to more accurately execute user goals.

Plan tasks more rationally

Although large models like deepseek perform very well in planning, the problem is that large models can only generate task lists once, and if you want to fully meet the user's goals, human participation is required and all of them need to be regenerated again. For more information, follow my official account wwwtangshuangnet. Moreover, as a dialogue, large model chat applications cannot directly make feedback modifications. Agents can adopt certain strategies to plan tasks from the application level, such as determining the final task list and stages through interaction with users, and giving specific types of tasks a fixed planning framework according to system settings.

Phased spiral task execution

For large models, calling a tool is a one-time process, and the user needs to make decisions on whether the tool is selected correctly and whether the result meets expectations. Large model applications can often only execute one task at a time, that is, the tool is selected based on the previous result and the result is returned to the user.

When performing tasks, the Agent is more controllable. It can perform tasks in stages, and execute them step by step through the task planning list. During the execution process, it can interact with the user, and can also run tasks in parallel or in parallel under the multi-agent framework, and repeatedly confirm and verify. Even if it is not satisfied with the results of the task stage, it can dynamically adjust the entire demand, and advance towards the goal little by little in a spiral evolutionary way.

Taking GenSpark programming goals as an example, when we initiate a programming task in GenSpark, it will go through the process from understanding requirements to architectural design, then to prototyping, developing MVP, writing code, and testing. These phased steps allow the Agent to not rush to get results immediately during the execution process, but to strive to achieve the step goals at each step.

Automatic execution

Another major feature of Agent is automatic execution, that is, under certain conditions, it can make decisions and execute by itself without human intervention. This automatic execution is different from traditional RPA software. In RPA, people set the automatic execution path, and the software automatically executes according to this fixed path. For more information, follow my official account wwwtangshuangnet. The automatic execution of Agent does not require people to set the execution path, but the Agent dynamically judges and makes decisions during the process. This is extremely important for people's work. For people, they only need to put forward requirements and get results (of course, this result can also be discarded when it does not meet expectations), without participating in the process. This is extremely important for saving time and cost and improving efficiency.

Relationship between MCP and Agent

MCP can help Agents improve their tool calling capabilities. On the one hand, there will be a large number of MCPServers in the MCP community that can be used as backup tools. On the other hand, Agents can use a more intelligent large model as the "brain" for tool scheduling, making tool calling more accurate.

However, MCP is not designed specifically for Agent. When we develop Agent, we need to do infrastructure development to connect to MCP.

Agent is a system that includes a scheduling system, an execution system, and a perception system. MCP is just a small point in the execution system. In addition, there is not much discussion about the perception system in the market. My personal understanding is that the perception system also relies on tools, such as cameras and sensors, but unlike calling tools to execute and get results, the perception system may be more about subscribing to the remote server and receiving message notifications from the remote server. However, from a security perspective, this remote-dependent solution will cause damage to the local system when the remote end does not work. Therefore, this may be a deeper topic, and it is also the reason why it has not become a mainstream discussion in the industry.

Interaction protocol between agents

A new protocol architecture. Before this, there was no formal interaction protocol between agents from different manufacturers on the market. Yesterday, Google announced their A2A protocol solution at a press conference. This solution aims to solve the problem of interaction standardization between agents. However, judging from Google's tone, I don't think they can push this standard to universalization.

However, the A2A protocol standard will inevitably emerge. I have mentioned in previous articles that users may only have one agent to serve them in the future, and a single agent often cannot solve all problems. At this time, the natural solution is to call other agents to solve specific problems, but you don’t want to buy new agents, so you connect to other agents to achieve specific purposes, and release or disconnect them immediately after completing the specific task, which is both environmentally friendly and convenient.

At present, the MCP market is booming, which makes it easier for us to call tools to get specific results. However, there is a problem here. In the whole process, people still play a leading role. Agents, on the other hand, leave the decision to machines. People are only the demand side, and agents are the delivery side. When the agent network becomes more and more mature, it can even be out of human control and form a network system with self-decision and self-execution. That is, as described in some science fiction articles, multiple agents form a social network, each playing different roles, communicating with each other and exchanging data without human intervention.

Conclusion

This article briefly discusses the differences and connections between MCP and Agent, so that readers can understand in simple language what knowledge and technology are needed from MCP to Agent. In 2025, as the year of Agent's explosion, I have seen that many Agent developers on the market are already profitable, but at the same time, the execution results of Agent are still a certain distance away from people's ideal results. With the increasing requirements for execution results, in the future, developers will definitely continue to optimize Agent to find a better solution between results and expectations.