Further Discussion on Openmanus and Mcp Building Operations and Maintenance Agent Practices and Considerations

Written by
Iris Vance
Updated on:June-09th-2025
Testimonials

An in-depth discussion on the practices and challenges of OpenManus and MCP in O&M Agent construction.

Core content:

  1. MVP testing and implementation insights of the OpenManus project in O&M scenarios
  2. Multi-intelligent body system architecture and functional analysis of each intelligent body module
  3. Detailed introduction to the implementation of OpenManus core mechanisms: planning, reflection, and memory mechanisms
 
Yang Fangxian
Founder of 53A / Tencent Cloud (TVP), Most Valuable Expert

Background of OpenManus

Previously, I wrote an article introducing the OpenManus project, which focuses on running the instances provided in the OpenManus project through locally built open-source modeling services. With the OpenManus project engineering design in-depth thinking and the introduction of the minimum MVP test to our operation and maintenance scenarios, found that there is still a lot of work to be done from the actual scenarios available, in conjunction with the business operation and maintenance scenarios after the implementation of the landing for some time, to re-organize the record of the OpenManus project and the MCP to build the operation and maintenance of Agent practice and thinking about the experience.

Technical Analysis

OpenManus is a multi-intelligence system, its architecture contains a number of modular intelligent bodies, such as planning, execution, and tool invocation. These different agents have their own responsibilities and work together. A layered architecture design is used to clearly divide into different intelligent body modules, such as planning, execution, and tool invocation. There are:

  • PlanningAgent:
    responsible for task insertion and process organization;
  • Execution Intelligent Body (SWEAgent):
    code generation and tool invocation;
  • Tool Intelligent Body (ToolCallAgent):
    responsible for the execution of browsing the web, file management, and other specific operations.

 

When it comes to Intelligent Body, we need to revisit its basic elements. We are generally more familiar with the definition of an AI Agent as "planning + memory + tool call + execution" combination, in which LLM is the core controller of the system.

  • Planning:
    Deconstructing tasks, generating subtasks, and self-reflection;
  • Memory:
    Short-term and long-term memory management, providing contextual background for problems, and can be used in complex task scenarios.
  • Tool Usage:
    Enhance the Agent's capabilities by invoking external tools to supplement additional information.
  • Execution (Executor):
    Through the execution of the tool to obtain the tool returns results, and then feeds back to the Agent for analysis or summary.

 

Next, around the OpenManus planning mechanism, reflection mechanism, and memory mechanism, the realization of the core content of these pieces of roughly introduced as follows

 

1. Planning mechanism

OpenManus planning process: user input task request -> create an initialized plan -> decomposition of tasks into executable steps -> generate structured plans. Among them, the role of the PlanningTool class is to create and manage the plan of the task, generate a linear plan, and assign tasks to the corresponding Agent.

When the user enters a task request, the _create_initial_plan method is called, which calls LLM and PlanningTool to generate the initial plan, which contains the task goal, step decomposition, and status tracking.PlanningTool decomposes the task into executable linear steps and generates a structured plan (goal, step, status). ). Excerpt from the interface that calls LLM and PlanningTool to generate the initial plan:

curl --location 'http://localhost:11434/api/chat' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen2.5:14b",
"messages": [
{
"role": "system",
"content": "You are a planning assistant. Your task is to create a detailed plan with clear steps."
},
{
"role": "user",
"content": " Create a detailed plan to accomplish this task: I would like to rent a one-bedroom apartment near the new Jiangwan City in Yangpu District, Shanghai, China, and require a rent of no more than 8,000, 2-bedroom, 1-bathroom, 90 square feet or more, and a good neighborhood environment, please help me to recommend"
}
],
"options": {
"temperature": 0.0,
" stream": false
},
"tools": [
{
"type": "function",
"function": {
"name": "planning",
"description": "\\nA planning tool that allows the agent to create and manage plans for solving complex tasks.\\nThe tool provides functionality for creating plans, updating plan steps, and tracking progress.\n",
" parameters": {
"type": "object",
"properties": {
"command": {
"description": "The command to execute. set_active, mark_step, delete.",
"enum": [
"create",
"update",
"list",
"get",
"set_active",
"mark_step",
"delete"
],
"type": "string"
},
"plan_id": {
"description": "Unique identifier for the plan. Required for create, update, set_active, and delete commands. optional for get and mark_step (uses Optional for get and mark_step (uses active plan if not specified).",
"type": "string"
},
"title": {
"description": "Title for the plan. Required for create command, optional for update command.",
"type": "string"
},
" steps": {
"description": "List of plan steps. Required for create command, optional for update command.",
"type": "array",
"items": {
"type": "string"
}
},
"step_index": {
"description": "Index of the step to update (0-based). Required for mark_step command.",
"type": "integer"
},
"step_status": {
"description": "Status to set for a step. Used with mark_step command.",
"enum ": [
"not_started",
"in_progress",
"completed",
"blocked"
],
"type": "string"
},
"step_notes": {
"description": "Additional notes for a step. Optional for mark_step command.",
"type": "string"
}
},
"required": [
"command"
],
"additionalProperties": false
}
}
}
],
"tool_calls": "required"
} '

The interface returns the following results:


{
"model": "qwen2.5:14b",
"created_at": "2025-05-31T08:07:06.248964463Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
" function": {
"name": "planning",
"arguments": {
"command": "create",
"steps": [
{
"description": "Determine the scope and conditions of the search, including rent caps, house type requirements, etc.",
"step_index": 0
},
{
" description": "Use online platforms for initial screening, e.g. Chain.com, 58.com, etc.",
"step_index": 1
},
{
"description": "Contact a real estate agent to get more information and book a viewing appointment",
"step_index": 2
},
{
"description": "Site visits to Potential Listings",
"step_index": 3
}
],
"title": "Rental Program: near New Jiangwan City, Yangpu District, Shanghai"
}
}
}
]
},
"done": false
}
{
"model": "qwen2.5:14b",
"created_at": "2025-05-31T08:07: 06.359368765Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "stop",
"done": true,
"total_duration": 11605808012,
"load_ duration": 27693026,
"prompt_eval_count": 454,
"prompt_eval_duration": 218000000,
"eval_count": 208,
"eval_duration": 11343000000
}

Planning extensions and challenges

  • Planning model: Current planning uses linear planning, but the MetaGPT team plans to introduce a DAG (Directed Acyclic Graph) structure to support more complex task dependencies. For example, the Data Interpreter task may require multi-step conditional judgment and dynamic adjustment.
  • Implementation of Dynamic Planning: The system dynamically adjusts the plan based on the tool execution results through the update_plan_status method. For example, if a step takes too long, the system may execute other subtasks in parallel to improve efficiency.
  • User feedback-driven optimization: User-triggered replanning can be supported if a task fails or is not understood.

2. Reflection mechanism

The reflection mechanism is mainly reflected in the use of ReAct mode combined with real-time feedback.

  • ReAct mode: OpenManus will make dynamic decisions and iterative optimization through ReAct mode during task execution. The core mechanism of the ReAct framework is realized in the ReActAgent class, which defines two static methods: think and act. The LLM analyzes the current state, generates the think result, and selects the tool to perform the next operation. In this process, the system will verify and repair the previous decisions and actions through self-reflection, so as to improve the quality of task completion. For example, in the think method, LLM generates a response with tool options based on historical messages and tool states, a process that is essentially a reflective adjustment of past behavior.
class ReActAgent(BaseAgent, ABC):
name: str
description: Optional[str] = None

system_prompt: Optional[str] = None
next_step_prompt: Optional[str ] = None

llm: Optional[LLM] = Field(default_factory=LLM)
memory: Memory = Field(default_factory=Memory)
state: AgentState = AgentState.IDLE

max_ steps: int = 10
current_step: int = 0

@abstractmethod
async def think(self) -> bool:
" ""Process current state and decide next action"""

@ abstractmethod
async def act(self) -> str:
" ""Execute decided actions"""

async def step(self) -> str:
" ""Execute a single step: think and act.""" "
should_act = await self.think()
if not should_act:
return "Thinking complete - no action needed"
return await self.act()
  • Real-time Feedback and Dynamic Adjustment: OpenManus' real-time feedback mechanism not only shows the user the thinking process (e.g., task decomposition logic, tool invocation steps) but also allows the system to dynamically update the memory and context based on the execution results. For example, when a duplicate step is detected, the is_stuck method triggers an exception termination to avoid falling into an invalid loop. This mechanism optimizes subsequent actions by reflecting on historical steps.

Compared to traditional end-to-end models, OpenManus' reflection mechanism makes it possible to pinpoint the problem more accurately when a task fails, but there is still a lot of optimization to be done. Some of the points that come to mind are as follows:

  • The OpenManus project is currently a single-intelligent body system, which can lead to a blurred scope of error attribution. When there is an instrumentation-level error in the task chain (e.g., browser automation failure), the system can only locate the current execution node, but cannot trace back to the potential decision-making errors in the upstream planning stage. Therefore, there is a need to introduce error propagation tracking for multi-intelligent body architectures to construct task dependency graphs.
  • Insufficient dynamic decision-making mechanism: the state evaluation threshold setting of the ReAct loop is too rigid, which is insufficient to cope with complex scenarios.

3. Memory mechanism

The memory mechanism is through multiple rounds of dialogue, context association. For example, in dynamic context management, each interaction will record the complete dialog history (including user input, tool call results, LLM response, etc.) through the update_memory method, forming a traceable context chain. In the initialization of the ReActAgent class will instantiate two objects, one for short-term memory, through the MemoryBuffer to maintain a fixed length of the context window, to avoid the context is too long leading to performance degradation or error reporting; the other is the long-term memory, the use of a vector database to store the key information, to support through the semantic retrieval of the invocation of the historical knowledge.

OpenManus implements the storage and management of conversation history through the Memory class. This component records the interaction records between the user, the system, and the tool in the form of message lists, and dynamically updates them through the update_memory method . Specific implementation includes:

  • Message type support : supports sub-type storage of user input, system messages, assistant replies, and tool execution results;
  • Dynamic update mechanism : automatically evaluates the importance of the message after each interaction and decides whether to include it in the memory bank to avoid redundant data accumulation.

class Memory(BaseModel):
messages: List[Message] = Field(default_factory=list)
max_messages: int = Field(default=100)

def add_message(self, message: Message) -> None:
" ""Add a message to memory"""
self.messages.append(message)
# Optional: Implement message limit
if len(self.messages ) > self.max_messages:
self.messages = self.messages[-self.max_messages :]

def add_messages(self, messages: List[Message]) -> None:
" "" Add multiple messages to memory"""
self.messages.extend(messages)

def clear(self) -> None:
" ""Clear all messages"""
self.messages.clear()

def get_recent_messages(self, n: int) -> List[Message]:
" ""Get n most recent messages"""
return self.messages[-n:]

def to_dict_list(self) -> List[dict]:
" ""Convert messages to list of dicts"""
return [msg.to_dict() for msg in self.messages]

Regarding long-term memory, this is still in the planning stage. The community mentions introducing long-term memory to store verified knowledge or stable states to improve the efficiency of repetitive tasks. The source code shows that current memories are only stored temporarily in memory, but the architectural design leaves open the possibility of extension via database or file persistence.

Practical Examples

Inspired by the OpenManus project, we also conducted a pilot exploration in the big data real-time engine side operation and maintenance scenario. Using intelligence planning, tool invocation, execution, and reflection processes to build self-updating and optimizing O&M tools.

1. Scenario

  • We analyze the root cause of Flink real-time job anomalies, which mainly include: job failure, data pileup, broken flow, and checkpoint failure.

2. Objective

  • Solve the root cause analysis of job anomalies (job failure, data pileup, broken flow, checkpoint failure) to improve the effectiveness of automated operation and maintenance, and diagnosis.

3. Execution process

3.1. Process key node description:

3.1.1 Memory Retrieval Prioritization

User requests are often insufficiently informative, and in order to have a more comprehensive understanding of the problem, it is necessary to introduce an additional knowledge base to supplement the context of the problem as well as the solution idea, and this kind of content supplementation, we define it as memory retrieval. As described earlier for the memory mechanism in the OpenManus project, its core goal is the same. The main implementation on memory retrieval is as follows:

class Memory:
" ""Memory system for storing the interaction history of intelligences" ""
def __init__(self):
self.messages: List[Message] = [] # Message queue (short-term memory)
self.knowledge_base = Elasticsearch() # Knowledge base (long term memory)

def add_message(self, message: Union[Message, dict]):
" ""Adding structured message to memory stream"" "
if isinstance(message, dict):
message = Message(**message)
self. messages.append(message)
self._update_base(message) # Trigger knowledge base update

def _update_base(self, message):
" """Build knowledge node based on message content"" ""
entities = self.ner_ extractor.extract(message.content)
for entity in entities:
self.knowledge_base.insert(
label=entity['type'],
properties={'name': entity[' text'], 'context': message.context}
)

def get_relevant_memories(self, query: str, top_k=5) -> List[Message]:
" """Getting Relevant Memories Based on Hybrid Retrieval" """"
query_embed = self.encoder.encode(query)
return sorted(self.messages,
key=lambda x: cosine_similarity(query_embed, x.embed))[:top_k]

3.1.2 Planning based on problem and context

When planning with LLM, it is necessary to understand the user's problem and plan according to the provided context, so as to avoid LLM "self-play" and improve the rationality of planning. The main optimization point of the process is Prompt, about llm planning process. Prompt content can learn from the OpenManus planning Prompt study. In this case, the planning Prompt content is as follows:


FLINK_DIAGNOSIS_PLAN_PROMPT = """
You are a planning agent expert tasked with solving complex problems by creating structured plans.
Your requirements are:
1 . If the problem has an explicit idea, construct the plan according to that idea; if the idea is provided in the context, construct the plan according to the contextual idea; if the problem does not have an explicit idea, generate a structured plan based on the content of the problem.
Your job is to:
1 . analyze the request to understand the scope of the task.
2 . use the Plan tool to create a clear, workable plan.
3 . use the available tools to execute steps as needed.
4 . have a known error message in the history that needs to be corrected in the next result you generate.
5 . track progress and dynamically adjust the plan. .
6 . The command object in the output json format content is of type array and the value in the array is of type str.The `thoughts` object is of type str.

Output the json list in the following format:

```json
{{
"plan":
{{
"thoughts": str # Document in detail the process of your analysis and your thoughts in the "Okay, my thoughts are..." Write
"instructions": [
str # Record the content of the steps
]
}}
} }

Break down the task into logical, consecutive steps.
" ""

3.1.3 Tool Chaining Calls

Since MCP exploded in the AI Agent space, there has been a steady stream of many related applications or frameworks integrating MCP. The advantage of MCP lies in the integration of tool definition specification, auto-registration, and ease-of-use invocation, among other features. Therefore, integrating all the tools into an existing Ops service (Java) is just a matter of upgrading the Ops service to support the MCP framework. Refer to the article "AI-based MCP Protocol Interpretation and Practical Application" to learn exactly how to build MCP-enabled Java services.

Through the independent operation and maintenance services integrated by the required tools, only need to provide an external API interface is needed to support the call. At the same time, the service's internal LLM execution logic is: call the tool according to the problem, the results returned to the tool, the implementation of the tool to get the results, and then continue to LLM execution until LLM returns results without the tool to summarize the contents of the output. Internal encapsulation of the feedback process, we only need to tune the system_prompt on the line. About the service to provide an external interface execution is shown below:

3.1.4 Reflection to strengthen the closed loop

At present, the reflection mainly includes two parts: inaccurate tool invocation and unreasonable planning. Regarding the inaccuracy of tool invocation, the main optimization point is in the internal logic process of MCP, which is mainly optimized by the accurate description definition of tools and the System Prompt. And the unreasonable planning mainly relies on the content entered into the knowledge base, the accuracy of the retrieved content, the planning Prompt for the model requirements, and the optimization of the planning process. Which focuses on the optimization of the planning process, needs to consider the results of the current planning to determine whether to carry out further planning, such as a self-optimization iterative process on the process. In general, we currently use the planning process in the linear iteration and constraints set, overall relatively simple, and the re-planning iteration only supports two iterations.

From the related information, we can obtain that in complex scenarios, the planning process of an AI Agent needs to break through the limitations of the traditional linear decision-making framework, and build an intelligent system with dynamic perception, elastic adjustment, and autonomous evolution capability. There are the following iterative optimization directions and technical realization paths:

    • Closed-loop feedback-driven dynamic replanning mechanism: The core logic is to construct the replanning process as a closed-loop system of "execution-feedback-correction" to realize real-time environmental response. The feedback is categorized into immediate operational feedback, semantic feedback, and user feedback.
    • Multi-granularity task decomposition and strategy reorganization: The goal is to shift from fixed rule task decomposition to a flexible decomposition strategy based on scene dynamics. Among them, one of the core technology points needs to have a dynamic complexity assessment model to predict the difficulty of the task, and then combine the hybrid decision-making mechanism of a rule engine (predefined industry templates) and a neural network (learning the optimal disassembly path).
    • Multi-Agent Collaborative Architecture: It is to build an Agent cluster system with role division of labor and knowledge sharing, breaking through the cognitive limitations of a single Agent. The core point is to select the appropriate Agent according to the task stage, and at the same time, it is necessary to build a memory sharing mechanism to facilitate the Agent in obtaining historical information.
    • Autonomous strategy optimization based on reinforcement learning: Based on the training framework, we need to build a closed loop of continuous learning of "environment-action-reward".

Overall, the planning and reflection process is a very important part of optimization, and there will be more practical and feasible solutions in the future. We need to continue to pay attention and explore.

4. Results Presentation

After building the diagnostic Agent capability, in order to allow users to directly experience the consultation, the interface provided by Enterprise Micro can be used to encapsulate it into a BOT or application service number, and then users can directly talk to the BOT or application service number. During the Q&A process, the BOT answer will include two pieces: the user request to analyze the planning content display and the execution process and results display. The purpose is to let the user know the agent's planning process and specific tools to call the results, show, and the final conclusion. The general effect is as follows:

Thinking

1. Talk about the biggest shortcomings of the current practice project

Putting aside the fact that the practice cases have realized the relevant core module functions necessary for the current Agent, there is a core content that is actually missing, namely, the evaluation mechanism. In the current practice cases, the correctness or otherwise of the results mainly relies on manual evaluation. In a previous article, Prof. Wu Enda talked about the key skills needed to build an agent, he believes that the lack of an evaluation mechanism is the biggest "invisible problem" in the current Agent construction process. He advocates the rapid construction of an evaluation system, even if it is very elementary, so that it can undertake many repetitive judgment tasks. More importantly, "tactile intuition" based on real data and real failure paths is the most valuable experience in system construction. Therefore, in the following, we will develop an evaluation mechanism to better assist in optimizing the Agent's capability.

2. How to make LLMs have the wisdom of autonomous decision-making and reasoning depth?

We have previously read an article about the Thinkless framework, which allows models to learn to be "smart and lazy" and find a balance between efficiency and performance. Because reasoning models often develop lengthy reasoning processes for both complex and simple problems. This will bring a great waste of computational resources. Then, in Agent Intelligent Systems, the requirements for LLM are diverse; some scenarios require no reasoning, and some scenarios require reasoning. For example, the routing process requires fast and accurate. The planning process requires the model to have its own thinking, not just plan according to the contextual content provided, so that flexibility and intelligence will be reduced. From the application developer's point of view, the core point is to combine different scenarios to set the role and type of llm model. Let LLM learn to be "lazy", but also further improve efficiency.