OWL team's 10,000-word sharing: The team that reproduced Manus' best performance, what do you think of the current status of Agentic AI?

Written by
Iris Vance
Updated on:July-09th-2025
Recommendation

In-depth analysis of the latest developments in the field of AI Agents, revealing the technology and business logic behind the OWL project.

Core content:
1. The technical differences and market performance of the OWL project and Manus
2. The technical principles and commercial implementation status of AI Agents
3. The mission and future prospects of the CAMEL-AI open source community

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
The popularity of Manus not only made Agent the most popular AI field in the first half of 2025, but also attracted more attention from developers to some open source Agent projects that replicated Manus.

OWL, launched by the CAMEL-AI team within one day after Manus went online, is the most representative one. The project's actual test results reached the performance ceiling of GAIA in the open source community, reaching 58.18% , surpassing the 55.15% performance of Open Deep Research proposed by Huggingface.

In early March, Founder Park invited the OWL team to an online closed-door sharing session, where they had an in-depth discussion on OWL's technical framework, Manus and Agent-related technical principles, current implementation logic, and commercial landing status.

After some desensitization processing, Founder Park sorted out the content of this precipitation.

Guest introduction:

Brandon Lee: Founder of the open source community CAMEL-AI.

Key Message;

  • The OWL project is not exactly the same as Manus. There are many technical differences, but they do similar things.

  • Because of the emergence of Manus, the public has seen the possibilities of AI technology, especially the practical application of agents now, which has ignited the wave of AI agent technology.

  • The Manus replication technology is relatively simple, focusing more on product interaction and form. Moreover, Manus has the advantage of being the first to launch, so it will be difficult for subsequent products to replicate its success.

  • MCP is the future. It allows all frameworks to access the same tools. Projects like Cursor and ours can use tools that comply with the MCP standard and use a variety of open source tools to improve the agent.

  • For Agentic AI, base model + external engineering framework is not the future trend.

  • If the work in a vertical field can be easily replaced by a general agent, it means that the work in that vertical field is not “vertical” enough and has not solved the core pain points of this field.

Founder Park is building a developer community, inviting developers and entrepreneurs who are actively trying and testing new models and technologies to join. Please scan the QR code to fill in your product/project information in detail. After passing the review, the staff will add you to the group~
After joining the group, you will have the opportunity to get:
  • High concentration of mainstream model (such as DeepSeek, etc.) development and communication;

  • Resource docking, opportunities for direct communication and feedback with API, cloud vendors, and model vendors;

  • Founder Park will actively promote useful and interesting products/cases.



01 

The origin of OWL,

And the difference with Manus

We are currently building an open source community called CAMEL-AI, and our mission is to "find the scaling laws of agent". Simply put, we believe that AI agents have their own unique "scaling laws", and our job is to find out what its "scaling laws" are.

We have been focusing on the underlying technology and have done a lot of cutting-edge research, such as creating the world's first multi-agent framework, the first cross-platform control project (CRAB: can control any APP on mobile phones and computers through the UI at the same time), and the world's first multi-agent system with 1 million agents - OASIS. These achievements from 0 to 1 have consumed a lot of energy, time and engineering research, but have received less attention, but we believe that they will be important infrastructure for future agent applications.

CRAB: https://github.com/camel-ai/crab

OASIS: https://github.com/camel-ai/oasis

Specifically, we are mainly doing the following things:

The first is to build infrastructure. This covers frameworks, data, agents and their communication protocols, as well as related applications. As an open source community, we will also develop some developer-oriented tools, mainly serving developers and researchers. At the same time, we are also conducting cutting-edge research, writing papers with everyone , and conducting open research. The open source OWL project is both an academic research of ours and a tool that developers can build on top of.

We firmly believe that there are specific rules in AI agents, so we have carried out a lot of different research and developed a lot of tools. For example, camel is one of our basic frameworks. With it, you can generate data, including COT data, instruction-following data, and alignment data. At the same time, it can also be used for task automation, such as the modules and workflows used in this OWL project. You can use it to implement UI automation, web page automation and other functions. We also use large language models to simulate large-scale complex systems. For example, the previous OASIS project used 1 million agents to simulate social network behavior, including rumor propagation, herd effect, opinion polarization effect, etc., to explore whether these simulations can be achieved through AI.

Let me briefly talk about camel. It is an agent framework. Unlike general frameworks, we pay great attention to data-driven and build frameworks from a data perspective, so that AI can achieve self-development in the future. In addition, camel integrates multi-agent, has data generation-related processes, integrates almost all mainstream models at home and abroad, integrates a large number of tools, has short-term memory and long-term memory functions, and supports multiple storage methods. We also have different benchmarks for agent benchmarks. It has multiple executable code interpreters and different data loaders, which are also used in this project. If you want to do retrieval, it supports both vector retrieval and Bm25 retrieval and other functions. This is why we can quickly replicate Manus, because we have a very complete tool library, with the help of this tool library, we can quickly build various applications.

The OWL project mainly replicates some functions of Manus. In the project, we proposed a technology called Optimized Workforce Learning for general multi-agent assistance, mainly dealing with real-world tasks such as web page retrieval, reading PDFs, and generating code. It is a bit of a catchy title to say that it is a 0-day replica, because we have been working on this project for a while, and most of the time has been spent on performance improvement. This is a scientific research project in which my doctoral students participate. Two weeks before the release of Manus, we achieved the highest score among open source projects on the GAIA benchmark. But it was not released because I thought the name given by the project leader was not very good, so it has not been released. It just happened to catch up with this wave of hot spots, allowing us to release this immature project ahead of schedule. After rapid iterations in the past few days, the project has become more and more perfect.

The OWL project is not exactly the same as Manus. There are many technical differences, but they do similar things.

Let me tell you about the system framework: after the user inputs the command, the multi-agent system will be entered, and the agents in the system are responsible for executing the task. We have an AI user agent and an AI assistant agent, which work together and play different roles to complete the task. This concept originated from the method we proposed in our paper two years ago. The OWL project follows this idea. The two agents talk to each other, and the assistant agent can call various tools, such as the web agent controls the browser, the search agent performs Google search or community search, the coding agent generates and executes code to obtain results, and the document agent reads and converts PDF format, etc., and any tool can be connected to our basic system.

For example, if you ask the agent to find movies playing in nearby theaters, it can open a browser, locate the city, and get information about recent movies. Or if you ask the agent to research a code repository, it will browse the repository, identify tasks, and generate reports. We have made a lot of updates recently, and now support Google search, process videos, images, and audio, use Playwright to browse the web, parse PDF documents, support code execution, and have a wealth of tools to enhance agent capabilities.

Users can choose different tools for different tasks, unlike Manus which can only use fixed tools. The advantage of open source is that you can customize your own tools and add unique tools to your own fields or application scenarios to improve efficiency and stability.


02 

Manus from the perspective of technical implementation

I like the Manus project very much. Although I haven't tried it yet, I think it is of great significance. I have mentioned it in recent sharing and WeChat Moments. There is a saying that describes it very aptly, saying that  Manus is like a fire that ignites   the wave of AI agent technology .

We have been doing research on underlying technologies for two years, from the first basic framework to now, but we have not received as much attention as this one. It is precisely because of the emergence of Manus that the public has seen the possibilities of AI technology, especially the practical applications of agents, such as doing research, writing code, and manipulating web pages. In fact, these technologies have long been used in the research field, but Manus is the first  to be presented to the public in an excellent product form, especially in the form of UI/ UX , which has attracted the attention of many people who did not know the technology before, including engineers, researchers and ordinary users, and has greatly promoted the development of AI technology. I think this is of great significance.

Of course, in addition to the significance of promoting technological development, objectively speaking, the online reviews of Manus are polarized. Some people say it is a national-level product, but I don’t think it has reached that level; others say it is a shell product, but it takes skill to do it well. After all, the underlying technology is based on NVIDIA GPU, so it is understandable to say it is a shell product.

From an engineering perspective, I can infer their approach through user cases and feel that there are two points worth learning.

  • First, (I guess) using the Ubuntu file system for context persistence and management is very flexible and efficient . Placing storage files in user folders makes it easy to read at any time, which is more flexible than traditional database semantic retrieval. Although we haven't made a strict comparison, it definitely has its advantages.

  • Second, use the terminal command line to its full potential. Friends with a technical background know that the command line is very versatile and powerful, and can be used to write code, browse the web, etc. If the AI ​​agent can use the command line proficiently, it will have super versatile capabilities and can also install Python packages or system packages to greatly expand its functions. Learning to use the command line as a general tool to solve problems is much more efficient than building tools yourself.

Overseas, Manus became popular two days later than in China, and the reviews were equally polarized. Some people thought the product was great, as if the era of general AI was coming; others, from a technical perspective, thought it was a simple "shell" product that anyone could make. Manus' chief scientist shared a lot on Twitter, and they themselves said that there were no technical secrets and they shared the technology very frankly, which sounded like a combination of some mature technologies. For example, they mentioned that the agent they used came from CodeAgent, a research project at UIUC, using the Claude-3.5 model as the main agent and using the post-training Qwen model for planning and the like.

Overall, I think Manus has a lot to learn from and is of great significance to the development of technology. It is not as bad as some people say.


03 

The gap between agents

It may be mainly due to the model gap

Q: How far are OWL and CAMEL from large-scale deployment? In actual tests, a single call consumes 240,000 tokens (cost is about $36). If it is a commercial product, how to build irreplaceable paid value? Is there any way to reduce consumption?

Brandon Lee:  Regarding the $36 cost, I'm not sure about the specific tasks, like proving Fermat's Last Theorem? For some simple tasks we do, like simply opening a web page to search for information, or researching a piece of news or a technology, the cost generally does not exceed $1, probably just a few tenths of a dollar. The cost of $36 is indeed quite high, after all, the model cost used in our framework is relatively low.

In the framework, we mainly use GPT related models, and o3-mini is used for a few reasoning tasks, which is much cheaper than Claude 3.7. Of course, if you want better results, you can choose Claude 3.7. However, it is not ruled out that the agent may not be able to complete the task, but repeatedly calls and tries. When the task cannot be completed, this may lead to a large amount of token consumption, and the cost will increase significantly. In this case, we can set restrictions such as the maximum number of steps to ensure that the cost is not too high.

In general, the cost of most tasks is not that high. Although it takes a few tenths of a dollar or even $1 to complete a task, I think as a commercial product, especially a ToC product, the most important thing at present is to reduce costs. Only when the number of users is very large and the cost can be reduced, can this product really achieve large-scale application (scale up). For example, OpenAI's Operator charges $200 per month, which many people think is expensive. I think Manus may be the same. They use an invitation code mechanism to restrict user use. It may not be for hunger marketing, but the server cost, computing power consumption, and model API call costs are quite high. If cost control is not done well, once it is open to all users, assuming there are 1 million users, it may cost up to $10 million a day.

As for how to reduce costs, this involves many aspects. First of all, in terms of model capabilities, if the model can complete tasks more efficiently, understand instructions more accurately, and execute through optimal planning, it will naturally be able to reduce costs. Secondly, from the perspective of reasoning and hardware, at the reasoning level, if quantization, sparsification, cache and other technologies can be done well, the reasoning cost can be reduced. At the hardware level, if dedicated reasoning chips that are cheaper than Nvidia chips can be used for hardware optimization, it is also possible to further reduce costs.

Q: What is the main reason for the gap between Manus and other complex tasks? What are the optimization directions?

Li Guohao:  We found through comparison on the GAIA benchmark that our performance at level-1 is similar to that of Manus, but at level-2 and level-3, our performance is much worse than Manus, about 20%. The main reasons are as follows:

  • First, we use different models. We use GPT-4o for testing, and Manus uses Claude 3.5, which is much better than our model because Claude 3.5 has the ability of Computer Use (code execution). Recently, OpenAI has also opened the computer use interface. If our project and Manus both use models that support computer use, the gap will narrow. Level one, level two, and level three are levels divided by task difficulty, and level three is the most difficult. Therefore, the model gap is the key. Switching to a model that supports Computer Use will greatly improve performance.

  • Secondly, we are also optimizing some tools to narrow the gap with Manus in terms of tools. In fact, we have developed quite a few tools, and each of us has tools that the other does not have, and we intend to make up for the missing parts.

  • Third, in terms of engineering optimization, this requires more debugging and more experiments to make it perform better.

By the way, MCP, we have integrated MCP, which allows us to use tools developed by any developer, which is great.

I think MCP is the future. It allows all frameworks to access the same tools. For example, cursor and our project can use tools that meet the MCP standard and improve the agent with the help of many open source tools. In short, use the "MCP Toolkit manager" to give it the MCP server information, connect to MCP to connect to the corresponding APP, and the agent can then obtain and use all MCP tools, which is the same as using MCP in other scenarios.

MCP introduction: Shixiang Technology explains MCP in 10,000 words: Agentic AI middle layer optimal solution and three opportunities for startups

Q: GPT has been around for about 3 years, why did Manus only appear now?

Brandon Lee: GPT was released in December 2022, and Manus only appeared now. I don’t think it was sudden, but rather a process of quantitative change.

In March 2023, we released the first multi-agent framework. At that time, we used multi-agent to write games, codes, stock trading software, etc. We did it very early and it was also in the field of scientific research. We didn’t make a good product and didn’t receive much attention. At that time, AutoGPT was also a very popular project. It was in the same period as us. It could do search, code generation, etc., but the effect was not good, but the whole form had existed for a long time. Later, products like Kimi, Doubao, Perplexity did a good job in search, Deep Research further optimized it, and OpenAI’s Operator could manipulate web pages. Manus is also a quantitative change based on these, and may have also produced a qualitative change. It appeared after optimization.

So the emergence of Manus is not sudden. It is very similar to the Operator product and has not been around for long. The industry says that it is not difficult to reproduce Manus. I think it is relatively easy to reproduce the form, but it still needs to be evaluated to achieve the same effect. So whether it is fair depends on the extent of this "reproduction". From a technical perspective, we have a better understanding of the underlying technology, and Manus itself said that there are no secrets. The reproduction technology is relatively simple, and it is more about product interaction and form. In addition, Manus has the advantage of being the first to be launched, and it will be more difficult for subsequent products to replicate its success.

Q: What do you think of Manus using CodeAct to call tools? What is the difference between it and MCP?

Li Guohao: Manus is a tool that is called by writing code, which does not conflict with all calls made using MCP. MCP solves the problem of unifying the interface between agents and tools, and MCP also supports calls executed in code form, so there is no contradiction.

Q: From the perspective of OWL, how do you view the relationship between the MCP route and multi-agent?

Li Guohao: The server of MCP can be a simple tool or an agent. If both the server and the client are agents, communication between the two agents can be achieved. Moreover, the server and the client can also be multi-agent systems, so communication between multi-agents can be achieved.

In short, MCP unifies the communication between them. As for the entities involved in the communication, they can be either tools or agents, which are defined by the users. 


04 

Vertical field agents need to be more in-depth,

More professional

Q: Agentic  AI  currently has two seemingly opposite implementation paths, one based on the end-to-end learning tool learning capabilities of the underlying model, and the other based on the base model + external engineering framework. How do you view the difference between the two?

Li Guohao: From the perspective of engineering methods, some of them may just be a transitional stage. Our framework basically did not take the latter route because we think it is not a future trend. In the past, many practices were to let AI output JSON, which we think is only a short-term behavior. Of course, there are also some engineering methods that can make the output JSON more stable or force its output, such as Outlines, XGarmmar and other tools, which can better call the tools by performing constrained sampling when sampling the model.

These two routes are actually complementary. The ability of the model to use tools is essentially a probabilistic model, and it is impossible to always guarantee that the calling tool is completely accurate. In terms of external engineering architecture, if a stable JSON is output through prompt engineering, it may still rely on model capabilities and is not the first choice in the long run; but if tool calls are implemented based on constrained sampling and other methods, it is a good way. The principle is to use a control mechanism to ensure that token sampling meets a certain syntax to adapt to tool calls.

In short, the two are not in conflict. By clarifying what should be done at the engineering level and the model level, we can make progress at the same time and do things better.

Q: Do you think that general agent frameworks such as Manus have taken shape? If so, are vertical agent frameworks more worthy of development?

Different fields have different information processing logic, required tools, data sources, and APIs  , which makes it difficult for general agent frameworks to adapt to vertical scenarios. For example, the logic of comparing and predicting the macro environment in 2025 and 2022 is completely different from that of an automatic price comparison ticket assistant. Based on the above arguments, where might the difficulty be?

Li Guohao: I think vertical fields are more worthy of efforts. Using general frameworks or models to solve problems in professional fields is bound to be inefficient or lacking in solving capabilities. For example, if you are a chemistry student and want to do chemistry experiments, you can apply the framework in the chemistry field and let the agent call related tools; if you do macro-environmental predictions, you can also provide specific data sources for the agent, etc., instead of relying on general solutions.

The most difficult part is to identify the problem. The difficulties vary in different fields. Some are due to a lack of tools, which can be supplemented by additional tools; some are due to insufficient reasoning capabilities, which means collecting data to optimize the model to improve reasoning; and some lack effective supervisory signals for training, which are relatively open. In this case, it is necessary to solve the problem through methods such as preference learning based on the expected results.

Q: Will the improvement of the capabilities of general agent products continue to squeeze the market space of vertical agent products? (like general search > vertical search?)

How can general agent applications solve the problem of output content personalization? (For example, in the travel guide scenario, without user preference data, it is difficult to generate results that meet the needs even if more web pages are crawled)

Brandon Lee: I think there are still some differences between the AI ​​field and the model field. Although it is not ruled out that general models will be able to solve many vertical field problems in the future, efficiency is always a problem. When general agents solve vertical field problems, there will always be insufficient efficiency.

In addition to efficiency, there is another problem in the short term, that is, whether general agents will continue to squeeze the living space of vertical field agents. If the work in a vertical field can be easily replaced by general agents, it means that the work in the vertical field is not "vertical" enough and has not solved the core pain points of this field .

Agents are very different from models. Agents require high-quality interactive interfaces and good UI/UX (user experience design). The output of models is usually text, while the output of agents is in various forms. For example, if you want to control a browser, you need a good UI/UX to display the control interface. If you want to control a machine, you cannot use the same set of products. If it involves professional fields, you may also need visualization results or specific control tools, so their UI/UX designs are very different, and the product forms are also very different. 

Therefore, if general agents squeeze out vertical field agents, it is necessary to do the work in the vertical field more deeply and professionally.


05 

Agent will bring new human-computer interaction

Q: How does the universal agent solve the problem of personalized content output?

Li Guohao: For the problem of personalization, the current online solution is more realized through the memory module. This module can generate different knowledge across different tasks, retrieve knowledge before performing tasks, recall the content, and solve problems such as user preferences from the memory level. However, this requires continuous interaction with it to generate personalization. OpenAI's ChatGPT also has similar functions. If you want to do better, you may need to provide more data or even train it.

Q: A general agent like Manus nests multiple models, which causes illusions in multiple steps of the business, plummets in availability, and is difficult to use commercially. How can it be optimized?

Li Guohao: I think the statement that "nesting multiple models will definitely lead to a linear decline in performance, and each step will definitely produce hallucinations" is not necessarily correct. It depends on whether the constructed system is a convergent system or a divergent system. If there are multiple agents, each step tends to converge more, and then fewer hallucinations will be produced. For example, if an agent that is not prone to hallucinations is used at each step, the performance will not necessarily decline linearly.

This problem needs to be analyzed in combination with actual scenarios to clarify the reasons for the hallucinations at each step, whether it is a problem with the model or the tool, and then consider whether it can be solved by replacing better models or tools.

Q: Is it possible that end-to-end agentic model products like Deep Research will eat up shell products like Manus in the future?

Li Guohao: It is not clear whether Manus will be trained end-to-end in the future. As far as I know, they themselves said that the planning model is trained, and the execution model uses Claude. But now most models can be fine-tuned, OpenAI also provides a fine-tuning interface, and Manus can also be fine-tuned. It is hard to say whether it can be considered end-to-end after fine-tuning.

I think if Manus can put on the "shell" well and make its own architecture more perfect, it will not necessarily be eliminated. This depends on their development path . They already have a lot of user data and are capable of end-to-end training. Open source models are getting stronger and stronger, and closed source models have also opened fine-tuning interfaces. Everyone has a chance. If Manus can accumulate more data and have better product ideas, it will not necessarily be eliminated. But OpenAI also needs a lot of product investment to do Deep Research well, so it is difficult to determine the future direction of Manus, which is currently unpredictable.

Q: Do you have any experience to share about the interaction methods of Agent products and the differences between Agent products and ordinary  AI  tools in terms of human-computer interaction? Will dynamically generated agents be a future direction?

Li Guohao: I think the difference between agent products and ordinary AI products in terms of human-computer interaction is very interesting. Many traditional AI tools require people to actively ask questions and assign tasks, and are more human-dominated. Agent products may be able to reduce human involvement and complete tasks more autonomously, requiring human confirmation only in special circumstances. If this can be achieved, the human-computer interaction method will be completely different.

In addition, it is not only human-computer interaction, but also the interaction between agents and machines, which is very interesting and different from traditional AI tools. For example, WeChat and Xiaohongshu are currently used by people. What will happen if they are used by agents in the future? There may be an interactive relationship between people, machines and agents. There are many areas worth exploring, such as whether the UI is different when used by agents and people.

Currently, many people are working on generative UI, which is also a way of human-computer interaction in the future. UI is not necessarily fixed. Dynamically generating agents is a development direction, and we are also working on related solutions.

Q: Does the Agent system have the potential to become the technical foundation for task management of embodied robots? The current system still needs to wait for the user to input a single task to activate. In the future, will it be able to monitor multiple tasks at the same time and have the ability to execute them at the same time?

Li Guohao: I think agent systems have great potential in the future, and this trend is already happening. We are also working on multi-agent systems that combine agent systems with machines. Many institutions are also doing similar explorations, using agent systems to call atomic skills to achieve the integration of AI agents and embodied scenarios. This is definitely the future direction.

The second question, from an engineering perspective, is that it is feasible to let the agent perform multiple inferences. For example, we can learn from the MapReduce method, assign multiple tasks, and then integrate their memories. I think this is not a big problem and is feasible.


06 

I am very optimistic about the agent.

AI  for Science

Q: What dimensions or criteria can be used to judge the quality of an agent system?

Li Guohao: Of course, there are many dimensions to judge the quality of a system. One is performance. There are some benchmarks, such as the GAIA benchmark used by Manus this time, and the OSWorld Benchmark developed by the University of Hong Kong, which is also widely used, including the Crab Benchmark that we are working on to control mobile phones and computers across platforms. In addition, from the perspective of efficiency, such as the speed of system operation and the resources consumed, these are also important dimensions for evaluating the system.

Q: If we want to use agents to integrate some domestic businesses, such as telephone counter-control and risk control, which are not covered by popular benchmarks, how should we build the corresponding benchmarks? Is there any work that can be used as a reference?

Li Guohao: If you use an agent for anti-fraud risk control on telephone calls, you must first mark the case as fraudulent or not, so as to conduct reinforcement learning or supervised learning. The key to building a dedicated benchmark is to ensure the diversity and sufficient amount of collected data, which is also the most basic. The traditional method is manual collection, but you must be careful to avoid data bias . For example, you cannot only collect call data from male fraudsters, but you need to understand the distribution of real-world data to collect it reasonably.

Another way is data synthesis, which is to synthesize more data based on existing data, and then annotate and filter it. In addition to data, it is very important to design reasonable evaluation indicators for benchmarking. The agent's indicators are different from general data indicators. In addition to whether the task is finally completed, the progress of the task completion must also be considered, such as the percentage of completion.

Q: If it is in  the field of AI  for Science, will there be a big difference in the product form between agent and general agent?

Li Guohao: I am very optimistic about the use of agents for  AI  for Science. Many tasks in AI for Science are repetitive and involve tool calls. However, AI for Science is different from traditional AI in that it is often slow and often needs to interact with the physical world, such as conducting physical, chemical, and even biological experiments, and the feedback cycle may take several days or even a year. Their interaction forms, data, time dimensions, and required tools vary from experiment to experiment, so the forms are naturally different. 

We have done work related to automated laboratories before, such as automatically searching for new compounds. This requires the agent to control the robotic arm to select and distribute drugs, observe and analyze the experiments, and even conduct reinforcement learning. It is a very complex scenario.

Q: For academic research projects with extremely limited resources, when doing research based on OWL or CAMEL, which directions should be prioritized and avoided?

Li Guohao: If resources are extremely limited, I suggest that you choose a research direction that is different from that of large companies or large startups, and focus on areas that they don’t care about or haven’t paid attention to yet. When I first started camel-related research and some subsequent research, I was based on the same considerations.

With such limited resources, what did we do? We avoided what companies like OpenAI and DeepMind were doing and focused on areas they were not going to do for the time being. These big companies have their own priorities, and some things that are important but not within their current priorities are directions that I think can be considered for focus.

For example, OpenAI's current priority may be to optimize the model and make good agents. Then we will focus on multi-agent and build larger-scale systems. Because we think they will not get involved in this field in the short term, and this is not their top priority. But at the same time, this is also a very important research direction. As we all know, AI  has five different levels of intelligence definition, and the fifth level is what can be done at the organizational level. I think only multi-agent systems can achieve organizational-level tasks, and multi-agent systems are undoubtedly an important development direction in the future. Since large companies are not very likely to do it now, this is a good entry point for teams with limited resources.