After two failed three tasks, the general agent can only sit at the children's table for the time being

Written by
Caleb Hayes
Updated on:June-27th-2025
Recommendation

The actual performance of general intelligent agents is surprising, and the test results reveal its limitations in practical applications.

Core content:
1. Analysis of failure cases of general intelligent agents in actual tasks
2. The current status and reasons for the immaturity of general intelligent agent technology
3. Future development direction and application scenarios of general intelligent agents

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

"General-purpose agents will completely change the way humans work!", "Agent truly opens up the commercialization of AI!", "Universal assistant allows you to get things done while lying down!" - Recently, such headlines have flooded all major platforms. It seems that if you don't say that your product is a "general-purpose agent", you will be embarrassed to go out and meet people. But after actual experience, I just want to say that general-purpose agents are currently at the level of the children's table.

I recently tested two of the most popular products: Coze Space and Manus. What were the results? Three actual tasks, two failed completely, and one barely passed.

Task 1: Crawl WeChat public account articles (complete failure)

The first task is simple: help me grab the latest 20 articles from a WeChat public account.

Articles on public accounts do not support external crawling, and to be honest, it is indeed difficult. I watched this "general intelligent agent" struggling to open Sogou WeChat search (already an old path in online tutorials), and then magically searched out old articles from 2020.

I thought, maybe it will correct itself? But it didn't. It continued to go further and further down the wrong path, like a lost child, but still full of confidence.

The whole process consumes a lot of tokens. It's like you hired someone who claims to be a "senior assistant" to help you find information, but he not only got the wrong file, but also wasted a lot of money on printing.

Task 2: Summary of emotional headlines on Xiaohongshu (barely passing)

The second task was relatively successful: summarizing the characteristics of emotional titles on Xiaohongshu.

This time, both companies have completed the task, but the method is the most primitive "browser search + article sorting". It basically reorganizes the analysis articles that others have already written.

How about efficiency? At least not as good as Yuanbao. Slow speed and shallow analysis, just like an intern who has just entered the industry, can get the job done but not outstanding.

Task 3: Analysis of Beijing's housing market (terrible)

The third task was to analyze the situation of the Beijing housing market after the small spring. This time, the truth was revealed.

A general-purpose agent would only look for information on public web sources, but would not know how unreliable public information is in the real estate sector. The result is predictable: analysis reports filled with outdated data, misconceptions, and superficial market "consensus."

This is like asking someone who has only read "Introduction to Stock Trading" to give you investment advice. He or she has no understanding of the working mechanism of the real estate market, nor does he or she know which information sources are more reliable, let alone interpret the true meaning behind the data.

Why are general intelligent agents "general but not specialized"?

The superficial reason is the immaturity of technology, but the fundamental problem is that they mistakenly underestimate the professional depth and complexity of vertical fields.

Take real estate analysis for example, which requires a comprehensive understanding of multi-dimensional data such as regional policies, historical transactions, supply and demand, land market, bank credit, population mobility, etc. It is simply a pipe dream to expect to come up with valuable analysis by just searching a few online articles.

This is just like medical diagnosis, where diseases with similar symptoms may require completely different treatments. Without deep professional accumulation and in-depth understanding of specific fields, no matter how "smart" a general intelligent agent is, it will only be spinning in shallow waters.

What should a true intelligent agent look like?

The value of a general intelligent agent lies not in "knowing who knows how to do everything" but in "knowing who knows how to do everything".

A truly valuable intelligent agent architecture should be: an efficient coordinator + experts in multiple vertical fields. Just like an excellent CEO, he does not need to program, design products, and run the market himself, but he knows what tasks to assign to whom and how to coordinate resources to achieve goals.

But in today's world where professional models are not enough, no matter which way the general driver takes you, it may not be smooth - perhaps it is suitable for handling tasks that are highly standardized and do not require deep professional knowledge: scheduling meetings, writing simple emails, organizing information, etc. These tasks are simple, but they can indeed save time. For any task that requires professional judgment - whether it is market analysis, content creation or technical issues, the general agent is more like an entry or distributor rather than an ultimate solution.

Are general intelligent agents reliable?

Don’t be fooled by the hype. AGI cannot appear in the human brain out of thin air. It needs to go through gradual development from shallow to deep.

General intelligence cannot replace experts in vertical fields in the short term, but it can become a bridge between users and these experts. Just like an efficient front desk receptionist, it does not need to understand all the business, but it needs to know who to refer customers to.

The real intelligent revolution is not to replace all professional tools with one "universal" tool, but to maximize the value of different tools in their respective areas of expertise and build a mechanism for efficient collaboration among them.

The current general intelligent agent is indeed only suitable for sitting at the children's table. But don't be pessimistic, every adult grew up from a child - the premise is that it must first learn to admit its ignorance instead of pretending to know everything.