Why doesn’t Manus use DeepSeek in its underlying model? — Six questions and answers about Manus

After Manus quickly went viral, all kinds of information were flying around. There were serious analyses and interpretations, but not many. Instead, there were more bloggers (especially short video bloggers) cheering in shock.

So while Manus was overloaded with all kinds of information, Pan Han tried to extract some valuable information for analysis and summarized six questions.

We strive to help readers understand “what” and “why” after understanding “what”, and at the same time help them get rid of FOMO anxiety in the dazzling explosion of AI information.

1. Why doesn’t Manus’ underlying model use DeepSeek?

What kind of models are used behind Manus’ various outstanding tasks?

According to public information from relevant bloggers, it was disclosed during a small-scale official product communication on the morning of March 6 that Manus mainly used Claude (api) and Ali Qwen, which was trained by itself.

Many readers may wonder why Manus doesn't use DeepSeek, which is powerful and cheap.

Why?

On February 22, Manus co-founder Ji Yichao had a conversation at Chaos Academy . During the conversation, Ji Yichao shared the following views:

"DeepSeek is not a panacea. Specific problems need to be analyzed specifically. For example, if you want to make function calls, the Qwen model may be more appropriate."

"DeepSeek's models (whether V3 or R1) focus more on reasoning capabilities, and are not outstanding in multimodality, function calls, long-term planning and other capabilities."

"From DeepSeek's recent V3 paper, its architecture is significantly different from traditional models, but except for the official ones, domestic inference vendors' Infra optimization is generally insufficient, and a lot of work is still needed." (At that time, DeepSeek Open Source Week had not yet started)

Some brainless readers should not oppose Manus and DeepSeek based on this (go back and read the original text carefully ). Ji Yichao is just analyzing the logic behind the technical routes they chose.

It can be seen that the reason why Manus did not choose DeepSeek as the underlying model is because DeepSeek is good at reasoning, but not good at function calling, multimodality, and long context.

These three things are exactly the characteristics that highly automated agent products such as Manus value very much:

Function calling allows Agent to perform a variety of tasks far beyond traditional Chatbot.

Long context allows the agent to complete complex multi-step tasks. If a task consists of 7 steps and the context is insufficient, the agent will forget steps 1 and 2 by the time it reaches step 6. In this case, many tasks will fail in a matter of minutes.

Multimodality is easier to understand. Many of Manus' tasks require reading web pages and watching videos. Multimodality gives the agent eyes.

Among these three points, the long-chain planning capability driven by long context is more important. Xiao Hong, founder of Monica (Manus’ parent company), also elaborated on this point in great detail in Zhang Xiaojun’s podcast interview:

"After testing, only Claude3.5 Sonnet in the world can run the architecture we just mentioned. We call this Agent capability internally. During the training process of traditional Chatbots, it is assumed that one round of conversation will try to solve all your problems. Only Claude3.5 Sonnet has the ability to plan long-term and solve problems step by step."

According to Twitter blogger Alexander Doria:

The reason why OpenAI's DeepResearch performed so amazingly is that it did not simply add an external search function based on GPT, but trained a brand new model to learn basic browsing capabilities (search, click, scroll, file parsing), as well as how to integrate a large amount of web page information through reinforcement learning to generate a research report with a clear structure and reliable sources.

In this sense, domestic models still have huge room for training in terms of the features required by agents.

2. Who is Devin, the inspiration for Manus?

Devin is an automated agent product that was very popular in the field of AI programming last year. It mainly focuses on the code field and is said to be able to replace a programmer.

It has three characteristics:

First, it is asynchronous, meaning you don’t need to interact with it synchronously all the time like a chat product. You can just give it a task and do other things.

Second, Devin has a cloud virtual machine, which is very important. This means that it can continue to work even if your computer is turned off, without touching the user's computer;

Third, it can interrupt at any time and seek help or confirmation from the user when help is needed, while accumulating Know How knowledge based on user feedback.

Manus has all three of these characteristics, and Xiao Hong also admitted that the team got inspiration from Devin and Cursor.

Devin is quite advanced in product definition, surpassing the development concepts of products such as Coze and Dify, and is also different from programming products such as cursor.

Xiao Hong commented on Devin like this:

"It's a pity for Devin. It directly chose the most hardcore group of engineers. It's not a pity. I believe it is also a very well-developed company in the United States. But I would rather choose a general one rather than a specific industry. I think this architecture is in line with my imagination of Agent, and it should be a product that ordinary users can use."

Dai Yusen is a partner at ZhenFund, an investor in Manus. He used a very exaggerated statement to evaluate Devin in the podcast Crossroads:

"Its appearance may mark an important moment in human history."

I was very surprised at the time. Why would Dai Yusen use such aggressive words?

His own explanation is -

Many tools have been invented in human history. Some even say that humans are animals that can use tools. Tools can basically be divided into two types:

The first type is tools that require sustained attention, such as electric drills and hammers.

The second type is called mechanical repetitive automation tools, such as washing machines and vending machine assembly lines. They do not require my attention, but they can only solve repetitive problems.

Humans have been looking for a third type - one that does not require continuous attention but can still self-plan to solve problems.

Devin is a truly autonomous agent, an agent within an agent.

That’s right, the significance of innovation at the engineering level is actually underestimated. In fact, the industry that has been most profoundly changed by this round of AI is AI Coding, and the key variable supporting this change is application-layer product innovation such as Cursor.

Devin is undoubtedly an innovative product and has been widely reported, but the question is - why was it only Monica's team that thought of learning from it and was the first to make it?

This is something that many domestic large and medium-sized factory teams need to reflect on.

3. How to understand the "Andy Bill's Law" in the AI era?

Zhang Tao from the Manus team posted this on Jike:

“At the end of last year, I began to advocate that the demand for inference computing power this year should not increase tenfold, but a thousandfold. Most of my friends thought I was crazy. This non-consensus put us on a very different path.”

The exponential growth of Agent’s token consumption is in line with what Xiao Hong called the “Andy Bill Law” in the AI era in an interview: “ Andy gives, Bill takes away.” This law states that in the PC era, when Intel’s Andy Grove doubled the chip performance, Microsoft’s Bill Gates would double the software complexity.

The same is true in the AI era. LLM manufacturers have reduced the cost of token generation by 10 times through performance optimization, while Agents represented by Manus will increase token consumption by 10 times.

Unlike the PC era, this law in the AI field is faster. The PC software and hardware upgrade cycle is usually 18 months, while the AI model and application iteration cycle is shortened to 2-6 months.

The development of Agent represented by Manus actually follows the "Jevons Paradox" - the improvement of steam engine efficiency stimulates the increase in total coal consumption, that is, when the efficiency of resource utilization is improved, the total resource consumption will not decrease, but will increase.

Based on the above deduction, from a longer-term perspective, I remain rationally optimistic about the development of domestic Agents.

Even if, as Zhang Tao predicted, the demand for inference computing power increases a thousandfold, given the country's traditional fine tradition of volume prices and volume costs, the "Chinese volume kings" are expected to gain structural cost advantages in the next round of Agent wars.

4. Is it easy for big companies to copy Manus’s work?

Many people say that what this team of dozens of people created can be quickly copied by large companies.

In fact, I think this is quite difficult to replicate because -

1. Composite experience:

The Monica team has very complex experience. Monica's products are connected to various models, so they have a deeper understanding of the capability boundaries of different models.

At the same time, the team is extremely familiar with browser products through long-term dealings with browser plug-ins, and has even launched projects to explore AI browsers similar to Arc . Manus's co-founder Ji Yichao has been immersed in the fields of NLP and machine vision for many years .

Monica is positioned overseas, which makes it easier for it to keep a close eye on the progress of overseas Agent products.

2. Understanding of requirements:

In a sense, this is the gene of Monica's team. This down-to-earth understanding of product needs can be seen in the founder Xiao Hong's early products "Yiban Assistant" and "Weiban Assistant". I used "Yiban Assistant" in the early days of public accounts, and it was indeed convenient and practical.

If you have used Monica in depth, you will be able to more intuitively feel the product power demonstrated by the product ingenuity - for example, in Monica, you can call up multiple model answers with one command;

For example, it supports quick calls using commonly used prompt words; not to mention its various very detailed and practical gadgets.

PMs working on AI applications in China should actually conduct in-depth research on Monica’s functions and the product concepts behind it.

For example, the entire process of Manus supports sharing. Sharing is actually nothing, but Manus's sharing supports process playback. Moreover, the web pages generated by Manus are directly publicly accessible. These small details are an intuitive reflection of the team's profound product skills.

Judging from the functional homogeneity of Doubao, Kimi, Yuanbao, and Wenxin Yiyan, I don’t think that large companies can significantly improve their innovation capabilities in a short period of time.

3. Team Agility:

Manus shared a point of view at yesterday's closed-door meeting: "The iteration cycle of AI products exceeds the OKR assessment cycle of large companies."

Behind this statement is a full explanation that compared with large companies, the agility of small and flat teams is very critical in the development of AI products.

The Manus team is not large. The product manager Zhang Tao was formerly the product manager of Lightyear Away. He has worked on C-end products for 8 years and B-end products for 5 years. Manus's co-founder Ji Yichao (the one who appeared in the product video) is also a technical expert. He single-handedly developed a product, Mammoth Browser, when he was in high school more than ten years ago.

The fact that founder Xiao Hong was able to bring these people together is in itself a testament to his industry knowledge and team-building abilities. This kind of invisible soft power cannot be solved simply by piling up people. Don't forget, DeepSeek only has more than 100 people.

The above three points explain why Monica's team was the first to create Manus. It is not easy for a large company to create a component in a short time for an agile team with comprehensive talents in all aspects.

5. Will Manus pursue large-scale financing next?

I think there is a high probability that it will happen.

The logic is that, although it is not easy for large companies to launch and catch up in a short period of time, they will definitely enter the market.

If Monica's original "All in One" aggregation product can find a relative comfort zone in a track that large companies disdain, then there is no way the giants will sit idly by as they watch Agent, the main battlefield that is in line with the future evolution direction of the AI industry.

Think about Coze and Yuanqi, which are comparable to Dify; think about Trae, which is comparable to Cursor.

Don't doubt the resources and determination of the big companies.

DeepSeek has undoubtedly shaken the determination of many large and medium-sized companies to train their own base models, but at the application level, their promotion efforts have instead reached a new level. Tencent's extremely fast response speed and saturation promotion of Yuanbao is a clear proof of this.

Let’s take a look at Devin mentioned above. What is its valuation?

The answer is 2 billion US dollars.

I don’t know the valuation of Monica’s team, but based on previous public information, it is most likely not at the level of 2 billion US dollars.

This means that Manus probably does not have the financial ammunition to fight a big battle with the giants at the moment.

Think about the bulldozer-like App factory like ByteDance, and the ruthless price-cutting machine of domestic cloud vendors.

If Manus faces a competitor from a large manufacturer in the future whose price is one-tenth of the price and experience is 80% of the quality, user loyalty will undoubtedly be put to the test, and price manipulation is just the most common practice of large manufacturers.

Therefore, according to this simple logic, although the Monica team was relatively conservative in terms of financing before (Monica has good cash flow), and Xiao Hong has repeatedly said that he would maintain his own pace, in order to stockpile ammunition for Manus, the team will most likely conduct the next round of financing.

Fortunately, this rapid rise in popularity will bring it a relatively ample financing environment, allowing its team to obtain relatively sufficient funds with a smaller share dilution to fight against future competition from giants.

6. How should we evaluate Manus’ performance?

We have indeed seen countless amazing cases in the official case, and different bloggers

Regarding its performance, I think there are three dimensions that may help us observe its performance more objectively:

1.GAIA's Benchmark score determines that its level is above the baseline;

GAIA was launched in 2024 by Hugging Face and other research teams in collaboration with AutoGPT. It contains 466 multi-domain, multi-modal questions and standard answers, and only tests the Agent's problem-solving ability.

GAIA is an anti-traditional evaluation. Unlike traditional AI benchmarks (such as professional fields such as law and mathematics), GAIA focuses on open-ended problems that are simple for humans but difficult for AI, which simulate everyday scenarios.

For example - " How many images are there in the latest LEGO Wikipedia page?"

Another example is: "In the astronomical photo taken by NASA on January 21, 2006, two astronauts can be seen. How many minutes did the younger astronaut stay in space? Round it off to the nearest minute."

Manus achieved performance that surpassed OpenAI on all three difficulty levels, which is undoubtedly impressive.

2. High scores do not directly determine actual performance of individual use;

For example, judging from the cases generated by bloggers Li Jigang and Lan Xi , its performance in various aspects is good, whether it is generating interactive popular science web pages or developing text games, the completion degree is very high.

However, judging from the experience of Huasheng and Guoke , some of the tasks were not completed well. For example, Huasheng asked them to generate a marketing plan for a book. The result was huge, but it was very general upon closer inspection and did not have much personalized content. The marketing budget for a book was 540,000, which was basically unusable.

Guokr's test also found that Manus had the following problems in some tasks: 1. Over-reasoning and too divergent; 2. Insufficient quality of information sources; 3. Unable to deliver in the required format; 4. Frequent need for takeover.

3. Correct evaluation is about understanding its boundaries and managing expectations;

Manus' current architecture determines that its boundaries are in the field of in-depth research and lightly synthetic information processing. It is a light patchwork of Artifacts+DeepResearch+Operator+Claude Computer Use, and is not yet complete.

For example, the core of its virtual machine's operating range is still the browser. If you ask it to edit a video or help you play "Elden Ring", this kind of input means you don't understand the capabilities of Manus.

Regarding Agent, a very simplified understanding of the logic is that the task success rate is equal to the number of task steps multiplied by the success rate of each step.

If a task is divided into three steps, and the success rate of each step of the underlying model is 90%, then the overall success rate is 90%*90%*90%=72.9%. Similarly, if the success rate of a single step is 70%, the overall success rate becomes 70%*70%*70%=34.3%.

One of the important reasons why AutoGPT was unavailable two years ago was that the single-step success rate of the model at that time was too low, resulting in the unavailable total success rate.

The release of Manus is based on the rapid progress of the basic model in the past two years, which has rapidly increased its success rate in certain single tasks to the point where it has reached a basically usable stage.

Manus seized this opportunity at the right time and brought it to the consumer market.

One factor that is easily overlooked about the Manus experience is the price.

According to the official closed-door meeting, the cost of a single task is US$2, which means that it costs about 14 yuan to complete a task.

This means that if Manus is priced based on cost + profit, the user may end up spending more than this price. So at this price, we undoubtedly need to re-evaluate the quality of its output.

Wait for subsequent market testing.

Conclusion

I saw this tweet yesterday:

Yes, I generally agree with this point of view, "If you don't embrace change, change will hit you against the wall", but I am against stress-induced anxiety.

Unless you are working in AI media, if you want to follow every new hot topic as soon as possible, you will never be able to catch up.

The correct approach is to let something new fly for a while when it comes out, then judge its value by integrating information from various dimensions and decide to what extent you should calibrate your cognition and actions.