AI Agent Explosion: Who Can Become Your First Choice "Super Assistant"?

Written by

Jasper Cole

Updated on:June-18th-2025

2025 is generally regarded as the first year of Agent . This year, we can see a variety of Agents popping up like mushrooms after rain, which completely refreshes people's understanding of AI assistants and automated productivity tools. Looking back from last year to now, there are several very representative Agent products.

1. OpenAI released its first Agent product, Operator, in January this year

Operator is an AI agent that can autonomously operate the browser to complete tasks . In the browser, the Operator agent can interact with web pages by typing, clicking, and scrolling like a human, without relying on custom API integration. It can automate online ordering and shopping on Instacart .

2. Then in February, OpenAI released Deep Rearch

Deep Research is an AI agent designed for in-depth research in finance, science, policy, engineering and other fields. It provides comprehensive and accurate research support to meet the needs of high-intensity knowledge work. OpenAI claims that it can produce an expert-level research report in 5-30 minutes.

3. In March , an Agent product called Manus sparked heated discussions on social media

Manus focuses on the automated decomposition and efficient execution of complex tasks. It combines a large language model with multiple types of intelligent agents to support the complete "task planning-assignment-execution-result summary" process. With just one natural language instruction, Manus will automatically refine the task, call browsers, search, programming and other intelligent agents to complete all subtasks, and output a structured result report. Its internal beta invitation code is hard to come by, and has even been sold for tens of thousands of yuan.

So what is an Agent ? A simple and clear definition:

Based on a large language model, an intelligent product can actively use various tools to complete tasks autonomously according to real-time environmental feedback.

The Large Language Model ( LLM ) is inherently powerful in semantic understanding and text generation, but it has many limitations. For example, LLM can only respond to text input, but it cannot " act " - that is, it cannot autonomously perform operations or interact with the external environment. This means that when users collaborate with LLM , they often only get static text output, but cannot directly promote task automation or closed-loop processing.

The emergence of Agent adds a layer of " scaffolding " to LLM , which is equivalent to giving it the ability to act autonomously. By integrating with various tools, APIs or environments, Agent can extend LLM 's understanding and decision-making capabilities to the level of actual operations, realizing the closed loop of " perception - thinking - action " . In other words, Agent can not only communicate with users based on natural language, but also automatically execute subsequent steps, greatly improving the efficiency and intelligence level of human-machine collaboration.

Let me give you a few simple examples:

1. Code generation, such as Cursor and Winserf

LLM can automatically generate code based on the prompt input by the user , but it cannot run or debug the code. The user must manually paste the code into the IDE environment, and then send the error information to LLM after running . LLM will modify it and then continue the above process. The whole process is very cumbersome. Such repeated operations are lengthy and cumbersome. Programming-assisted agents such as Cursor and Windsurf solve these pain points very well. They can not only automatically generate, execute and debug code, but also iterate and correct it autonomously when encountering problems, realizing full process automation, allowing developers to focus only on core requirements and significantly improve development efficiency.

2. PPT generation field, such as Gamma

LLM itself can generate PPT , but it is still very difficult to generate a PPT that meets our requirements. First of all, the content output by LLM is mainly the required long text, which is illegally structured according to the content and mapped to the slides. It is also impossible to generate charts, and the templates are relatively simple. It is also necessary to repeatedly add prompts when modifying the PPT .

Gamma is a content creation and visualization tool based on LLM . Gamma allows users to generate structured presentations, reports, web pages and other content with just one sentence or simple text description. He integrated data visualization tools based on chatgpt , which can generate charts based on text, and developed a self-developed intelligent typesetting engine to automatically divide content into blocks, groups, pages and arrange them beautifully. Behind the scenes, the Web front-end UI layout engine or self-developed typesetting algorithm may be called to " intelligently design " the content . And a large number of templates, color schemes and style themes are built in, and users can switch with one click when they choose, involving template retrieval and rendering logic.

From the above two examples, we can see that the current mainstream agents in the market are basically GPT base + automatic scene workflow + interface / interaction optimization, which can be understood as a shell based on GPT (shelling means to achieve a certain function based on the GPT model through customized prompts and fixed code processes and tools) .

3. Let’s look at Manus again

The core process of Manus can be roughly divided into the following steps:

Task planning: Use advanced LLMs such as Claude 3.7 to receive user questions and plan a detailed To-Do List . For example, if a user enters "Help me analyze and summarize the recent hot trends in the field of AI " , the system will automatically break it down into subtasks such as " collect the latest AI -related news, search for relevant papers, summarize the main points, and write trend reports " .

Task distribution: After that, Manus will use a lighter-weight large model to intelligently determine which professional agent should handle each subtask . For example, data collection tasks can be assigned to browser operation agents, code analysis tasks to programming agents, and information retrieval tasks to search API agents, thus achieving automation and optimal distribution of tasks.

Execution agent: Each subtask is automatically executed by the corresponding intelligent agent. Manus currently relies mainly on three types of core agents:

Browser operation agent (can simulate manual web browsing and operation, similar to Operator )
Search API call proxy (quickly retrieve and call network information)
Agents that write code (automatically generate, debug, run code, and handle related technical tasks)

Result summary: When the subtasks are completed, the task summary generator (probably also Claude ) reads the ToDo List and the results of each subtask, integrates them into the final output, and generates documents in different formats.

Seeing this, we should have a question in our mind: where is the moat of agents like manus and gamma ? Will they be easily replaced as soon as large model manufacturers upgrade or more competing products are launched?

The core moat of agent- like products is not simply the underlying model and general technology. What really determines their competitiveness is " product experience " and " user mind " . The most critical reason why these excellent agent products can accumulate users in a short period of time is that they have built an efficient, smooth and innovative experience process around actual needs, and used intelligent means to solve the " last mile " problem that LLM (large language model) itself is difficult to solve . For example, Manus realizes the automated disassembly of complex tasks and multi- agent collaboration, and Gamma greatly reduces the threshold for content structuring and visualization conversion.

Furthermore, the moat is also reflected in the product team's deep insight into demand and unremitting attention to detail. Continuously iterating and optimizing processes based on user feedback, establishing a unique interaction paradigm and service ecosystem, all of which are difficult for latecomers to catch up with in the short term.

Just like national-level applications such as WeChat, what really "sticks" users to the platform may not be the most cutting-edge underlying technology, but the ultimate product experience, rich ecology and solid user habits. In the process of constantly improving its own usage process, Agent products are also gradually building their own high thresholds - whoever can first seize the user's mind and shape the industry paradigm will become an " entry-level " brand under the new wave of AI .

Therefore, technological progress is certainly important, but a deeper moat is often contained in product experiences such as "enjoyable to use", "high efficiency" and "ability to solve practical problems". Only by constantly optimizing and evolving according to user needs can we truly achieve the accumulation of competitive barriers and the precipitation of brand value.