Explain clearly in one breath: the history of AI Agent development

Written by

Jasper Cole

Updated on:June-24th-2025

The term Agent is no stranger to technical students.

In the field of IT technology, Agent refers to the "agent" capability, which can be generally divided into three parts: the ability to perceive the environment, make autonomous decisions, and execute tasks.

For example, many tasks in the CICD pipeline are automatically triggered and executed by the Agent according to the configured logical rules, including different branches using different test environments, which technical components are called, and notification of the results of executing tasks.

In the field of AI, Agent refers to an intelligent body, which also has the ability to perceive the environment, make decisions and perform tasks, and achieve goals through perception, decision-making and action . Its core features include:

Perceptual ability: obtaining information about the external environment such as vision and hearing.
Decision-making ability : Reasoning and planning based on perceived information to select appropriate action strategies.
Action capability : Perform specific tasks or operations and influence the environment.
Learning ability: Continuously improve strategies through interaction with the environment.

AI Agent can also be understood as a combination of "big model + plug-in + execution process", corresponding to the control end, perception end and execution end respectively.

Up to now, the development history of AI Agent can be divided into four stages: naked large model call, simple Chatbot, multi-agent, and task agent .

1. Naked large model call

Simply put, it is similar to the backend interface call, which directly returns the response body . The processing logic is shown in the following figure:

2. Simple Chatbot

Chatbot, that is, chat robot, the most famous should be ChatGPT, which came out at the end of 2022 and started the wave of global AI acceleration.

The implementation principle of Chatbot is actually to encapsulate a layer on the naked big model call, and turn it from a backend interface call into a Chatbot with a visualized interface . Of course, each round of Chatbot dialogue will include system prompt words + historical dialogue + the latest round of user prompt words. The processing logic is shown in the figure below:

3. Multi-agent

The so-called multi-agent, that is, Multi-agent, Manus uses this technology to implement the architecture.

Multi-agent can be understood as multiple Agent processes/threads working in parallel, and they communicate with each other through a communication mechanism (such as TCP) . For example, metagpt is a typical multi-role collaboration (multi-agent parallel) work.

For more technical details about Manus, please refer to this article: A picture explains it clearly: Manus's technical architecture

IV. Task Agent

Task agents can currently be roughly divided into two types: short-task agents and long-task agents.

Short-task agents : pursue faster response times, such as virtual humans, smart speakers, and in-vehicle smart cockpits.
Long-task agents : agents that require longer steps or time to complete tasks, usually require AgentFlow for orchestration.

Long-task agents can also be divided into two types:

Copilot class : Common in the field of AI IDE, it allows manual intervention and parameter modification, as well as autonomous selection of reference materials and even routing.
Agentic type : This type pursues a higher level of automation and intelligence, and requires very little human intervention, such as Manus, metagpt, autogpt, etc.

The following is a schematic diagram of the autogpt workflow:

There are some concepts about AI Agent that need to be clarified to avoid confusion.

The concept of intelligent agent originally originated from Langchain. Langchain is a very old intelligent agent project. Its significance lies in proposing the concept and components of intelligent agent. Most of the intelligent agents that came out later have the shadow of Langchain.

The main features of intelligent agents include the following aspects:

Agent : An intelligent program that contains AI steps and can automatically complete multiple tasks.
Step/Chain : A step chain with input and output that will perform task processing.

Typical case : LLMChain, that is, the large model step, of course, also includes other forms of task processing.

Router : Routing rules determine which step to execute next.

The judgment condition can be some values or conditions, or it can be LLMRouter, which directly asks the large model which step to take next.

Tool : A basic tool call box, such as date, search, calculation and other basic functions.
The difference between Tool and Chain is that the tool will return to the step after the link is called.
Run concepts : context, status, etc.

Finally, it should be made clear that intelligent agents and large models are typical upstream and downstream concepts.

There is no competition or opposition between Manus (Agent) and DeepSeek (LLM). Instead, Manus competes with the DeepSeek application (chatbot).

The AI+testing full-link implementation practice technology training camp will start soon. The course outline is as follows: