Practice and Insights Behind Building and Operating OpenManus and MCP Agents

In-depth discussion of the practices and challenges of OpenManus and MCP in the construction of operation and maintenance Agents. Core content: 1. MVP testing and implementation experience of OpenManus project in operation and maintenance scenarios 2. Analysis of the multi-agent system architecture and the functions of each agent module 3. Implementation of OpenManus core mechanism: detailed introduction to planning, reflection, and memory mechanism
Background
I wrote an article before introducing the OpenManus project, which mainly runs the examples provided in the OpenManus project through locally built open source model services. With in-depth thinking on engineering design in OpenManus project and the minimum MVP test introduced into our operation and maintenance scenarios, we found that there is still a lot of work to be done before the actual scenario is available. After implementing the business operation and maintenance scenarios for a period of time, we will sort it out again and record the practice and thinking of the OpenManus project and MCP construction operation and maintenance Agent.
Technical Analysis
OpenManus is a multi-agent system whose architecture contains multiple modular agents, such as planning, execution, and tool calls, each performs its own responsibilities and works together. It adopts a hierarchical architecture design and is clearly divided into different intelligent modules such as planning, execution, and tool call. The following are:
- Planning Agent: Responsible for task plugging and process arrangement;
- Execution Agent: Conduct code generation and tool calls;
- ToolCallAgent: Responsible for performing specific operations such as browsing web pages, file management, etc.
When talking about agents, they need to take a closer look at their basic elements. The AI Agent definition that is generally familiar to everyone is the combination of "planning + memory + tool call + execution", where LLM serves as the core controller of the system.
- Planning(Planning)
: Disassembly, generate subtasks, and self-reflection; - Memory(Memory)
: Manage short-term and long-term memory, provide a context for the problem, and can act on complex task scenarios; - Tool Usage)
: Enhance the Agent's ability by calling external tools and supplement additional information; - Executor
: Obtain the tool's return results through the execution tool, and then feed it back to the Agent for analysis or summary.
1. Planning mechanism
OpenManus Planning Process: User input task request -> Create initialization plan -> Decompose the task into executable steps -> Create structured plan. Among them, the function of the PlanningTool class is to create and manage tasks, generate linear plans, and assign tasks to the corresponding agents.
When the user enters a task request, the _create_initial_plan method will be called. LLM and PlanningTool will be called to generate an initial plan. The plan includes task goals, step decomposition and status tracking. PlanningTool breaks the task into executable linear steps and generates a structured plan (objectives, steps, states).
curl --location 'http://localhost:11434/api/chat' \
--header 'Content-Type: application/json' \
--data "content": "You are a planning assistant. Your task is to create a detailed plan with clear "Content": "Create a detailed plan to accomplish this task: I want to rent a one-bedroom apartment near Xinjiangwan City, Yangpu District, Shanghai. The rent is not more than 8,000 yuan, 2 bedrooms and one living room, more than 90 square meters, and the community environment is good. Please recommend it to me."
"content": "Create a detailed plan to accomplish this task: I want to rent a one-bedroom apartment near Xinjiangwan City, Yangpu District, Shanghai. The rent is not more than 8,000 yuan, 2 bedrooms and one living room, more than 90 square meters, and the community environment is good. Please recommend it to me."
}
],
"options": {
"temperature": 0.0,
"stream": false
},
"tools": [
{
"type": "function",
"function": {
"name": "planning",
"description": "\\nA planning tool that allows the agent to create and manage plans for solving complex tasks.\\nThe tool provides functionality for creating plans, updating plan steps, and tracking progress.\\n",
"parameters": {
"type": "object",
"properties": {
"command": {
"description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
"enum": [
"create",
"update",
"list",
"get",
"set_active",
"mark_step",
"delete"
],
"type": "string"
},
"plan_id": {
"description": "Unique identifier for the plan. Required for create, update, set_active, and delete commands. Optional for get and mark_step (uses active plan if not specified).",
"type": "string"
},
"title": {
"description": "Title for the plan. Required for create command, optional for update command.",
"type": "string"
},
"steps": {
"description": "List of plan steps. Required for create command, optional for update command.",
"type": "array",
"items": {
"description": "Index of the step to update (0-based). Required for mark_step command.",
"type": "integer"
},
"step_status": {
"description": "Status to set for a step. Used with mark_step command.",
"enum": [
"not_started",
"in_progress",
"completed",
"blocked"
],
"type": "string"
},
"step_notes": {
"description": "Additional notes for a step. Optional for mark_step command.",
"type": "string"
}
},
"required": [
"command"
],
"additionalProperties": false
}
}
}
],
"tool_calls": "required"
}'
The result returned by the interface is as follows:
{
"model": "qwen2.5:14b",
"created_at": "2025-05-31T08:07:06.248964463Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "planning",
"arguments": {
"description": "determine the search scope and conditions, including rent limits, house type requirements, etc.",
"step_index": 0
"Title": "Live inspection of potential housing,"
"step_index": 3
}
}
]
},
"done": false
}
"model": "qwen2.5:14b",
"created_at": "2025-05-31T08:07:06.359368765Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "stop",
"done": true,
"total_duration": 11605808012,
"load_duration": 27693026,
"prompt_eval_count": 454,
"prompt_eval_duration": 218000000,
"eval_duration": 2180000000,
"eval_duration": 208,
"eval_duration": 11343000000
}
Scaling and Challenges of Planning
-
Planning Pattern: Current planning adopts linear planning, But the MetaGPT team plans to introduce a DAG (directed acyclic graph) structure to support more complex task dependencies. For example, the Data Interpreter task may require multi-step conditional judgment and dynamic adjustment; - Implementation of dynamic programming: The system passes
update_plan_status
The method dynamically adjusts the plan based on the tool execution results. For example, if a step takes too long, the system may execute other subtasks in parallel to improve efficiency; - User feedback-driven optimization: If the task fails or is not understood, the user can be supported to trigger re-planning.
2. Reflection mechanism
The reflection mechanism is mainly reflected in the use of a combination of ReAct mode and real-time feedback.
- ReAct mode
: OpenManus will make dynamic decisions and iterative optimization through ReAct mode during task execution. Regarding the implementation of the core mechanism of the ReAct framework, the implementation of the ReActAgent class is specifically reflected, and two static methods are defined. Analyze the current state through LLM, generate thinking results, and select the tool to perform the next operation. In this process, the system will verify and repair previous decisions and actions through self-reflection, thereby improving the quality of task completion. For example, in the think method, LLM generates a response with tool options based on historical messages and tool states, which is essentially a reflective adjustment to past behavior.
class ReActAgent(BaseAgent, ABC):
name: str
description: Optional[str] = None
system_prompt: Optional[str] = None
next_step_prompt: Optional[str] = None
llm: Optional[LLM] = Field(default_factory=LLM)
memory: Memory = Field(default_factory=Memory)
state: AgentState = AgentState.IDLE
max_steps: int = 10
current_step: int = 0
@abstractmethod
async def think(self) -> bool:
"""Process current state and decide next action"""
@abstractmethod
async def act(self) -> str:
"""Execute decided actions"""
async def step(self) -> str:
"""Execute a single step: think and act.""""
should_act = await self.think()
if not should_act:
return "Thinking complete - no action needed"
return await self.act()
- Real-time feedback and dynamic adjustment
: OpenManus' real-time feedback mechanism not only shows the user the thinking process (such as task decomposition logic, tool call steps), but also allows the system to dynamically update memory and context based on the execution results. For example, when a repeated step is detected, the is_stuck method triggers an abnormal termination to avoid falling into an invalid loop. This mechanism optimizes follow-up by reflecting on historical steps.
Compared with traditional end-to-end models, OpenManus' reflection mechanism allows it to locate problems more accurately when the task fails, but there are still many to be optimized and modified. The current points are as follows:
-
OpenManus project is currently a single agent system, which will cause fuzzy errors in the error attribution range. When a tool-level error occurs in the task chain (such as browser automation failure), the system can only locate the current execution node, but cannot trace the potential decision-making errors in the upstream planning stage. Therefore, it is necessary to introduce error propagation tracking of multi-agent architectures to build a task dependency map; the dynamic decision-making mechanism of - The dynamic decision-making mechanism is insufficient: the state evaluation threshold of ReAct loop is set too rigid and the ability to cope with complex scenarios is insufficient.
Memory mechanism
The memory mechanism is through multiple rounds of dialogue and contextual association. For example, dynamic context management, each interaction will record the complete dialogue history (including user input, tool call results, LLM response, etc.) through the update_memory method, forming a traceable context chain. When initialized in the ReAgent class, two objects will be instantiated. One is used to implement short-term memory and maintain a fixed-length context window through MemoryBuffer to avoid performance degradation or errors caused by too long context; the other is long-term memory, which uses a vector database to store key information and supports calling historical knowledge through semantic retrieval.
OpenManus by
Memory
class stores and manages dialogue history. This component records interaction records between users, systems and tools in the form of a message list, and uses
update_memory
Dynamic update style="color: #24292F;font-size: 16px;"> . Specific implementations include:
- Message type support : Supports user input, system messages, assistant replies and tool execution results;
- Dynamic update mechanism : Automatically evaluate the importance of information after each interaction, decide whether to be included in the memory bank, and avoid redundant data accumulation.
class Memory(BaseModel):
messages: List[Message] = Field(default_factory=list)
max_messages: int = Field(default=100)
def add_message(self, message: Message) -> None:
self.messages.append(message)
# Optional: Implement message limit
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages :]
def add_messages(self, messages: List[Message]) -> None:
"""Add multiple messages to memory""""
self.messages.extend(messages)
def clear(self) -> None:
"""Clear all messages"""
self.messages.clear()
def get_recent_messages(self, n: int) -> List[Message]:
"""Get n most recent messages"""""
return self.messages[-n:]
def to_dict_list(self) -> List[dict]:
"""Convert messages to list of dicts""""
return [msg.to_dict() for msg in self.messages]
return [msg.to_dict() for msg in self.messages]
style="font-size: 16px;"> About long-term memory, it is still under planning. The community mentioned the introduction of long-term memory storage verified knowledge or stable state to improve the efficiency of repetitive tasks. The source code shows that the current memory is only temporarily stored in memory, but the architecture design reserves the possibility of expansion through database or file persistence.
Practical cases
Inspired by the OpenManus project, we have also conducted pilot exploration in the operation and maintenance scenarios of the big data real-time engine side. Use the intelligent planning, tool call, execution and reflection process to build self-update and optimize operation and maintenance tools.
1.
-
The root cause analysis is performed for Flink real-time job exceptions. The exception types mainly include: job failure, data accumulation, flow failure, and Checkpoint failure.
2.
-
Solve the root cause analysis of job abnormalities (job failure, data accumulation, flow failure, Checkpoint failure) to improve automatic operation and maintenance and diagnosis effects.
3.
The specific process of the execution process is as follows:
User requests are often insufficient. In order to have a more comprehensive understanding of the problem, additional knowledge bases need to be introduced to supplement the context of the problem and solve the problem. This content supplement, we define it as memory retrieval. As mentioned earlier, the memory mechanism in the OpenManus project is consistent. The main implementation of memory retrieval is as follows:
class Memory:
"""Memory system that stores the interactive history of an agent"""
def __init__(self):
self.messages: List[Message] = [] # Message Queue (Short-term Memory)
self.knowledge_base = Elasticsearch() # Knowledge Base (Long-term Memory)
def add_message(self, message: Union[Message, dict]):
"""Add structured messages to memory stream"""
if isinstance(message, dict):
message = Message(**message)
self.messages.append(message)
self._update_base(message) # 触发知识库更新
def _update_base(self, message):
"""Build knowledge nodes based on message content"""
properties={'name': entity['text'], 'context': message.context}
)
)
def get_relevant_memories(self, query: str, top_k=5) -> List[Message]:
"""Get related memories based on hybrid search""
query_embed = self.encoder.encode(query)
return sorted(self.messages,
key=lambda x: cosine_similarity(query_embed, x.embed))[:top_k]
When using llm for planning, you need to understand the user's problems and plan according to the provided context content, so as to avoid llm "self-exercise" to improve the rationality of the planning. The main optimization point of this process lies in Prompt. Regarding the llm planning process, we can learn from the planning Prompt in OpenManus. The contents of the planned Prompt in this case are as follows:
FLINK_DIAGNOSIS_PLAN_PROMPT = """
You are a planning agent expert, and your task is to solve complex problems by creating structured plans.
Your requirements are:
1. If the idea is clear in the question, build the plan according to the idea; if the idea is provided in the context, build the plan according to the context; if the idea is not clear, generate a structured plan according to the content of the question.
Your job is:
1. Analyze the requests to understand the scope of the task.
2. Use the “plan” tool to create a clear and feasible plan.
3. Perform steps using the available tools as needed.
4. There are known error messages in the history and need to be corrected in the next generated result.
5. Track progress and adjust the plan dynamically.
6. The directive object in the output json format content is an array type, and the value in the array is a str type. The type of the `thoughts` object is str.
Output json list in the following format:
```json
{{
"plan":
{{
"thoughts": str # ]
"instructions": [
str # Record the steps
]
}}
}}
}}
Since MCP became popular in the field of AI Agent, many related applications or frameworks for integrating MCP have been continuously launched. The advantage of MCP is that it integrates the features of tool definition specifications, automatic registration, and ease of use. Therefore, integrating all tools into existing operation and maintenance services (Java), you only need to upgrade the operation and maintenance services to support the MCP framework. Refer to the article "Interpretation and Practical Application of AI-based MCP Protocol", you can learn how to build a Java service that supports MCP.
After integrating the required related tools by independent operation and maintenance services, you only need to provide an API interface to support the call. At the same time, the llm execution logic of the service internal llm is: call the tool according to the problem, return the tool to the result, obtain the result, and then continue to llm execution until llm returns the result without the tool, summarize the content and output. The feedback process is encapsulated internally, and we only need to tune system_prompt. The implementation of external interfaces for service provision is as follows:
The current reflection mainly includes two links: inaccurate tool calls and unreasonable planning. Regarding the inaccurate tool call, the main optimization point is the internal logical process of MCP, which is mainly optimized by accurately describing the tool definition and System Prompt. The unreasonable planning mainly depends on the content entered into the knowledge base, the accuracy of the search, the planning Prompt for the model requirements, and the optimization of the planning process. Among them, the optimization of key planning processes requires consideration of the results after the current planning to determine whether to further plan such a self-optimization iteration process in the process. In general, the planning process we currently adopt is in linear iteration and binding condition setting, which is relatively simple overall, and the planning iteration only supports two more times.
It can be obtained from relevant information. In complex scenarios, the planning process of AI Agent needs to break through the limitations of the traditional linear decision framework and build an intelligent system with dynamic perception, elastic adjustment and autonomous evolution capabilities. There are the following iterative optimization directions and technical implementation paths:
- The dynamic re-planning mechanism driven by closed-loop feedback: the core logic is Construct the re-planning process into a closed-loop system of "execution-feedback-correction" to achieve real-time environmental response. Among them, feedback is divided into: real-time operation feedback, semantic layer feedback, and user feedback.
- Multi-grained task decomposition and strategy reorganization: The goal is to shift from fixed rules to elastic decomposition strategies based on scene dynamics. Among them, one of the core technical points requires a dynamic complexity evaluation model to predict the difficulty of the task, and then combine a hybrid decision-making mechanism between rule engines (predefined industry templates) and neural networks (learning the optimal disassembly path).
- Multi-Agent collaborative architecture: It is to build an Agent cluster system with role division of labor and knowledge sharing, breaking through the cognitive limitations of single Agent. The core point is to select the appropriate agent according to the task stage, and at the same time, it is necessary to build a memory sharing mechanism to facilitate the agent to obtain historical information.
- Autonomous strategy optimization based on reinforcement learning: It is necessary to build a continuous learning closed loop of "environment-action-reward" based on the training framework.
In general, the planning and reflection process is a very important optimization link, and there will be more practical and feasible solutions in the future. We need to continue to pay attention and explore.
4.
Result Display
After building the diagnostic agent capability, in order to allow users to directly experience consultation, they can use the interface provided by Qiwei to encapsulate it as a BOT or application service account, and then the user can directly talk to the BOT or application service account. During the Q&A process, the BOT answer will include two parts: user requests to analyze the planning content display, execution process and result display. The purpose is to let users know the agent's planning process, specific tool call results display and final conclusion. The general effect is as follows:
Thinking
1. Talk about the biggest shortcomings of current practical projects
Putting aside the relevant core module functions that have been implemented in practical cases, there is a core content that is actually missing, namely: the evaluation mechanism. In current practical cases, the correctness of the results mainly relies on manual evaluation. An article previously wrote about Professor Andy Ng talking about the key skills required to build an Agent, he believed that the lack of evaluation mechanism is the biggest "invisible problem" in the current Agent construction process. He advocated quickly building an evaluation system, even if it was very primary, so that it could undertake many repetitive judgment tasks. More importantly, the "tactile intuition" based on real data and real failure paths is the most valuable experience in system construction. Therefore, in the future, we will develop an evaluation mechanism to better assist in optimizing Agent capabilities.
2.
I read an article about the Thinkless framework before. It allows the model to learn to be "smart and lazy" and find a balance between efficiency and performance. Because inference models often develop lengthy inference process regardless of complex or simple problems. This will lead to a lot of waste of computing resources. In the Agent intelligent system, the requirements for llm are diverse. Some scenarios require no reasoning, and some scenarios require reasoning. For example, the routing process requires fast and accurate. The planning process requires the model to have its own thinking and cannot be planned based on the provided context content, so that flexibility and intelligence will be reduced. From the perspective of application developers, the core point is to set the role and type of the llm model based on different scenarios. Let llm learn to be "lazy", and at the same time further improve efficiency.