Further Thoughts on Manus

The new experience and in-depth thinking of Manus virtual machine explores the future form of AI assistant.
Core content:
1. The actual experience and new thinking of Manus virtual machine
2. The upper limit and value of AI+ virtual machine
3. Diverse application cases of AI assistant in virtual machine environment
Due to some accidental reasons, I recently used Manus. Compared with most of the use cases I read before, I also tried a few scenarios and experienced some of the corresponding functions.
I found that after the actual experience, I had some new thoughts and different feelings, so I made a supplement based on the previous article. There are also some different feelings and opinions mentioned in the previous article. It can be regarded as a kind of cognitive iteration. This is also the benefit of continuous recording. You can observe your own thinking and be aware of the problems in your thinking and path.
The value of AI+virtual machines
This part is actually a major feature and even a selling point of Manus. I found that I just casually skipped this part in the previous analysis without thinking deeply about the value of it. Especially when we saw some other product forms of agents in April, such as Fellou, which cuts in from the form of a browser, it further aroused the question of what an AI assistant should be if it has a specific form and carrier? Is it an APP, a browser, or a virtual machine?
The virtual machine is actually the heaviest of these, but it also has the most comprehensive capabilities, so it also brings the highest ceiling. The core capabilities of the virtual machine are actually two parts, browser + code environment. The browser solves the problem of information and knowledge, and the code environment solves the problem of hands-on execution and solving specific problems.
The upper limit of these two parts combined is actually very high, and we may have underestimated the value. Let's imagine that this is like you really invited a very smart intern to your company or home, you configured a computer for him, and explained the specific working environment to him.
Compared to the chatbots we have used before, the chat conversation may be more like an online meeting, where the person on the other side can only "use their brains", while manus really brings people into the actual working and living environment, realizing "using both brains and hands". To put it more abstractly, AI's hands and feet are freed, and perhaps only energy consumption and permissions are restricted.
Some people may not get this difference. Let’s give some examples of what manus can actually do (AI helps me think of it). We see that many tasks completed by agents may be relatively static, such as writing a code or a web page, or deeply analyzing and retrieving a topic content, etc. These may be what we can think of more easily.
But in fact, if AI has a virtual machine environment, it can do much more than that. For example, it can read third-party API documents by itself, learn a new language or tool by itself, call third-party APIs to complete tasks by itself, and complete more complex automated tasks across multiple different web pages and environments.
For example, a market analyst needs to collect the latest industry data. In the past, he had to manually log in to dozens of industry websites, register an account, apply for API, crawl data, clean data, store data, analyze with Python, and constantly iterate scripts. Now, the AI assistant goes online autonomously in the virtual machine, automatically applies for an account according to instructions, receives verification codes, calls APIs, crawls data, writes to the local database, and then automatically generates preliminary analysis reports and data visualizations using libraries such as pandas according to instructions, and gives several independent insights. For example, if an entrepreneur wants to launch a new product, he needs to buy CDN from Alibaba Cloud, pull open source code from Github, configure AWS email push, register Stripe for payment, configure Notion for document management, and even go to Zhihu to find experts to cooperate in promotion. Each platform requires logging in, verification, reading policies, filling out forms, detecting errors, and even converting Chinese and English text. In the AI+virtual machine form, users only need to describe their goals (such as "put this SaaS website online + implement email push + complete online notification + capture the functional structure of the first three websites in the same industry"), and AI can intelligently disassemble tasks and gradually and automatically complete application, configuration, testing, integration... Whenever there is a permission barrier at each critical link, it will automatically push the authorization to the user.
Although the above example may not be able to be accurately executed at the current product level, it does give us more room for imagination under the AI + virtual machine architecture.
We are thinking about something a little more sci-fi. In theory, can AI also use other large model products in a virtual machine, such as using chatGPT to do deep research, or even connect to the server to train the model? What about continuous iteration and self-evolution 24/7? It is certainly a bit sci-fi at the moment, but the boundary and upper limit are indeed much higher than we imagined.
Based on this framework and the more user cases I have seen, I think it is completely fine to call it a universal agent. This is where I judged the problem in my previous article. My judgment at that time was more based on the official use case. At present, when the technology is not so good and the quality of the completed tasks is not so high, these cases may be the only ones.
However, if we start from the underlying logic, this product framework and form may be able to move towards more general scenarios. It is also a design logic for rapid technological iteration. I believe that the model will become more and more intelligent, and the price will become lower and lower. At the same time, as the model capabilities are enhanced, the product experience will become better and better, rather than being replaced.
The core value of the product
The previous article was more about how to do it, but less about analyzing the core value of the product to users. After using it this time and looking at more user cases, I have some different thoughts on this issue.
In the past few years since AI came out, the words we hear most are probably cost reduction and efficiency improvement, etc. This is the value brought by AI, but it is not the greatest value, or the most core value. For example, many previous tools and SaaS can also achieve similar effects of cost reduction and efficiency improvement.
The greatest value of AI lies in expanding the boundaries of each of us , which is particularly evident in the product Manus.
We see more use cases where users can use Manus to accomplish more things that they may not have been able to do before. What they think of can be realized instead of just thinking about it.
It is neither like SaaS that can only do one thing, nor like traditional RPA that automates rigid processes. Instead, it can proactively understand the goal, constantly adjust itself, and push the task all the way to the end.
What is more important is the relationship between AI and humans, which continues to evolve from a tool to a collaborative partner . Agents break our physical and information acquisition bottlenecks, allowing people to move on to deeper issues. AI and humans have a collaborative and growing relationship.
Agent is your personalized intelligent assistant, completing repetitive and automated tasks while continuously evolving. It is not just about completing chores, but about continuously evolving based on your tasks and habits, accelerating the entire closed loop of "people-goals-execution-feedback-iteration", because the execution AI in the middle can now help you, and you are no longer limited by your existing capabilities and knowledge information boundaries. You are also continuously evolving in this process, your energy is being released, and it is more important for you to think about " what to do " and " why " rather than "how to do it". At the same time, in the process of collaboration, based on the specific process of Agent completion and the division of goals, you are also continuously learning and growing. In the end, the boundaries and limitations may only be our own.
So now you may understand why they called Manus' name. The value of action is huge.
Manus, which means "hand" in Latin,
is a universal AI assistant that turns your thoughts into actions.
After AI has hands, it has evolved from a simple tool to actual productivity. Previously, AI may have been more about providing answers, information, and suggestions. But Manus really takes action and directly delivers results. Before, each of us may have had our own limitations in ability and information, but now that the barriers of information and tools have been broken, each of us has the possibility to break through our ability boundaries and limitations. People are liberated from tedious repetition. The improvement of efficiency may not only help you do things faster, but also help you complete things that you could not do before. You spend your energy and time thinking about what to do and the ultimate goal. You change from an executor to a commander, and you gain a sense of achievement in creation. Based on the user's tasks, behaviors and continuous interaction process, continuous evolution provides the ultimate personalized experience. Your preferences, style, and decision-making habits are effectively accumulated, and you get an AI that understands you better the more you use it.
Core experience of the product
Based on the value that the above-mentioned products bring to users and my own experience in actual use, I have summarized several core experiences that this type of agent needs to provide.
Intelligent decomposition of mission objectives
If you want to execute a specific task well, the initial alignment with the user is crucial . Otherwise, it is easy for the AI to complete the task with great effort, but the user finds that it is not what they want, or they misunderstood it. In many cases, the delivered result may not be bad, but it may not be what the user wants.
And observing the current habits of many people using AI, "It seems that everyone wants to test how smart the AI on the other side is, and they are very frugal with words " (just a joke)
Based on this background, we need to be able to accurately understand the user’s true intention, even if the user’s expression is vague, colloquial, or even contains some implicit context.
This "understanding you" experience is one of the most enjoyable experiences in the process of using AI. It is like the feeling that you have a partner who understands you very well in your daily work. With just a glance from you, it can get the key points.
For example, if you ask it to give you a market analysis report, it will determine that you need a PPT to make a report based on the relevant information you asked before, and then divide and execute the tasks based on your goals. This experience brings great user value.
This kind of experience is that you can tell it the general idea and it can understand it, and even think of details that I didn't expect, which can build a strong sense of trust and dependence.
There may also be some specific requirements for product design:
The second confirmation should not be too rigid. For example, the task may be clear enough, but confirmation based on rules every time will make people feel uncomfortable. The specific content and dimensions of the confirmation should also be adjusted based on the task and the current context, rather than being repeated mechanically. To make continuous progress, for example, if a habit or template has been mentioned in the previous task, you should be able to remember it instead of repeatedly asking for confirmation. Or if you find that the user repeatedly mentions a certain requirement in different tasks, you can suggest whether to solidify it into a specific process. It should be more flexible. For example, it is not necessary to communicate and confirm with users in text. After all, the interaction pressure of reading text and typing is relatively high. You can consider the message card method, allowing users to make choices directly through click interaction instead of text. After all, text is ambiguous and sometimes may not convey specific intentions so accurately.
This part is actually the core of many AI products. Because of the uncertainty brought by the large model technology itself, we need to increase more certainty in product design. Manus’s release of the virtual machine execution process is actually a way to convey certainty.
Intelligent task execution
The environment and process of task execution in real scenarios are quite complex. In many cases, it may not be a so-called simple technical problem, but a real-world environment problem.
For example, a verification code is required to log in, there is no permission to download resources, the interface is not the latest, there are problems with the code environment, the network is limited, etc. These problems are actually what we find a real intern who may encounter in actual work.
If the product gets stuck when encountering a similar problem, the experience is actually not good. Just like in real work, if we find that an intern is stuck on a similar problem, you may feel that this person is a bit slow.
Therefore, how to effectively solve these problems is what the product needs to design and think about. How can it become more and more intelligent during the task execution, and how can it confirm with the user in time when encountering specific environment-related problems without disturbing the user too much?
For example, if an account has logged in before, does it not need to log in continuously afterwards, even if it is a different task?
For example, some appropriate fault tolerance and retry mechanisms, if one method does not work, try another one, or provide some solutions and suggestions and communicate with users to continue to move forward;
For example, if you have encountered a problem before, you may have a corresponding solution, so you can solve it yourself next time instead of repeating it again.
The core is that it can become more and more intelligent as it sees more and more scenes as it performs tasks, providing users with a more certain feeling.
The last point is to have more proactive design and interaction, instead of having to ask users for confirmation only when a problem occurs, to create a more comfortable environment and experience for collaborating with AI.
Make some predictive suggestions and push notifications based on previous task records and current context. For example, the current task is a market analysis report, and the report mentions a product that is a major competitor that the user has mentioned before. After completing the current task, the user can be reminded that the corresponding competitor has new actions. Can we analyze it in depth?
The settings can also continuously monitor some fixed data or information sources. When information related to the user's previous records and context is found, it will be actively pushed and analyzed to determine the value of the current information to the user and put forward some constructive task suggestions.
More flexible human-machine collaboration and feedback
In order to further enhance the user's sense of security and certainty, the entire process of task execution should be traceable at any time and even a breakpoint can be set. In addition, the display format should not be the original instructions or codes displayed to the user, because in many cases, the content you display may have a certain threshold for users.
A more visual display method may be needed, which can also clearly show what specific tasks are completed at each node, and can go back to a specific time point to modify some information or add some information to re-execute
For example, we can imagine a scenario where there is a very complex analysis report task. If we give the task to an intern, then a more efficient way is for us to first ask the intern to give me an outline, including the specific analysis directory and where to find information, etc. We will give some suggestions and then continue to move forward.
Of course, as an AI product, we may be able to ask for more. For example, even if all tasks are completed, I can ask to go back to the snapshot of a certain node in the middle, and make new requests from this point instead of starting from the beginning.
There is also room for more design and thinking on the specific time point and form of confirmation with the user, in order to find a balance between disturbing the user and the sense of insecurity.
A more personalized experience
Personalized experience is actually providing certainty to users, because personalization means familiarity to users. This AI has known me for a long time and has a good understanding with me. This is also the core value provided by the memory function in chatGPT. (I feel that the value of this function is also underestimated by everyone. ChatGPT has been online for almost two years.)
Back to the specific products, we need to have more accumulation of user habits and preferences. Manus did a good job in this regard. There was a knowledge module on the first day of its launch. Although the current experience may not be so good, this direction is definitely worth continued investment.
Based on the continuous use of users, more preferences, habits and patterns related only to the current users are accumulated.
- For example, all the content generated and retrieved by user tasks are automatically deposited into a knowledge base, which can not only be reused later, but also save costs
- For example, can some APIs called in previous tasks or some programs written form a capability library that can be continuously reused in subsequent tasks, even including user preferred search habits, tool usage habits, etc.
- For example, the preference library composed of various user behaviors, interaction rhythm, language habits, email wording in different scenarios, output style, coding naming habits, common code libraries, etc.
These preferences and habits may help us continue to provide users with unique and personalized experiences. At the same time, these experiences are also continuously iterated as users evolve. For example, if we find that a user has a new habit, or an operation is different from the previous habit, we can actively remind and suggest it, turning it from a purely passive tool into a continuously evolving collaborative partner.
Some thoughts on specific design
Back to the specific product design, we can have a more proactive design (AI calls this growth UI, which is also quite interesting)
More proactive predictive pop-ups, such as predictive information push and pop-ups based on the current user's behavior, specific pages, task requirements, and other contexts (Manus currently has some similar designs, which feel very good), such as whether a certain process should be set as the default, whether a certain task should be saved as a template, etc.
Just like collaborating with humans, AI can periodically invite users to have a "meeting" to review the process, such as settling some preferences and habits and correcting some problematic parts. Periodically analyze the data, such as processing XXX pieces of data this month, to convey a sense of control and accomplishment after completing a task together.
We can even consider a more emotional design. After all, humans are emotional animals, and we also need more emotional value at work, such as tone of voice, encouragement when encountering specific problems, incentives after completing tasks, etc.
These may be values that were difficult to provide with pure tools in the past.
Thoughts after actual product experience
During the specific experience, I still found some functional details that I had not discovered before when looking at the use case. Some of them are more in line with the content mentioned above, and some of them also have new ideas
Second confirmation of the task
The second confirmation of the task is achieved through the knowledge module (this design is also very clever, which is equivalent to the second confirmation being one of the user's preferences, rather than setting the so-called rules separately), and the second confirmation is actively proposed by the product after the first task is completed and before the second task begins (this timing is also great, which is equivalent to embedding the capability into the specific process, rather than abruptly interrupting the current process to make a setting)
However, there are some minor problems with the knowledge provided by this system. The secondary confirmation process is a bit rigid, which is to force users to provide more information, rather than tracing back to the original goal to supplement the information.
Your question is a short request, such as asking for an analysis report, and the confirmation is to ask you to confirm the time range, specific field and depth requirements, etc. At first glance, it seems that there is no problem, but in fact, if you think about it carefully, a better question is whether to think about why the user needs this report. Starting from the goal may provide a better answer.
Just like the similar logic in the workplace, S-level talents are not only about completing tasks, but also about digging out goals and achieving goals, rather than simply executing orders.
At present, this may also be a process for humans to adapt. People may have the current habit of just giving you a task and asking you to complete it without asking too many questions.
There is actually a big question about the use of virtual machines. Why does a user have to open different virtual machines for different tasks? Wouldn't it be easier to accumulate context information if more information is deposited in the same environment?
Isn't it a waste to start a new environment every time? Some configurations and information accumulated in previous tasks may not be effectively reused, and some habits and templates may not be effectively accumulated.
After thinking about it, it should be a cost issue. It should be quite expensive to keep a virtual machine running all the time, and the consumption of task points is indeed quite large. I looked at the membership benefits, and it should not be used too many times. The cost issue should be a very important issue at the moment.
The use case shared by other users has a button that is similar to the one in the previous video. It reminds me of the viral effect of Tik Tok.
This idea is quite good. Users may not know what they can do with Manus, so they can refer to other people's cases. The specific function implementation is also quite interesting. It seems that the previous user context + the user's current demands are brought in to perform new tasks, so that users can also put forward their own personalized needs.
The whole design is quite clever, but there are some other considerations. I feel that this method is flexible enough, but not precise enough. For example, I may want to completely refer to the design and style of other people's website cases and just change some content. This may not be possible because the task is completely re-executed.
This leads to another question: Should users only share the entire case? Is it possible to share a part of the task they are doing, perhaps a template or a process? I wonder if this will have the potential to build a community atmosphere.
To sum up
This round of AI may not only bring about a technological revolution, but also a revolution in thought. People’s ideas and ways of thinking are more difficult to change. Whether it is our ideas for designing products or the way users use products, they may face new changes.
Although Manus is a product with great potential in terms of product form, there are still many challenges to be solved in its actual implementation.
These challenges are not enough for so-called model evolution. These challenges come from how we can go back to the starting point and think about what a fully intelligent AI product can bring us, how we should collaborate with AI, and how to rethink and redesign every specific detail and link. How to bring users a new experience that is valuable in all aspects, product design and details have become more important.