Woter AI detection.Hurry - ends Jun 29th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

How to design an entry-level Multi-Agent System for vertical scenarios?

Written by

Iris Vance

Updated on:June-13th-2025

There is a popular saying in the industry that " 2025 is the year of AI Agents ". In the first half of 2025, we have indeed seen a variety of "Agents" emerge. Although there is no consensus in the industry on the standard definition of Agent, such as whether workflows and SOPs based on processes are Agents, I personally believe that a true Agent must at least have the following capabilities:

Self-planning: Based on the target requirements, you can independently plan a complete plan to achieve the target
Autonomous decision-making: Based on planning and real-time environmental changes, each step can be decided autonomously
Tool usage: Decisions can be made through the autonomous selection and invocation of various tools to achieve the actual execution of decisions and complete the task.
Self-reflection: It is difficult to complete tasks in complex scenarios in one go according to the planned script. The agent's final effect requires continuous reflection during the execution process, and the real-time environment and tool call results must be constantly verified, reflected, and explored in the "perception, decision-making, execution" reflection iteration of new paths. One of the reasons why agents with multiple and complex scenarios are difficult to do well is that they have weak reflection ability and continue to cause "accumulated errors."
Continuous iteration: Business processes are not static, and LLM capabilities are constantly improving. It is also crucial to have the ability to continuously iterate based on the agent's execution results, processes, and manual feedback.

Since Sequoia proposed Service-as-a-Software last year, various traditional SaaS software have rapidly evolved into AI Agents, and many new Startups’ product delivery forms are Agent-Native by default. At least for users like me, the evaluation criteria for Agent product forms are purely result-oriented.

Agent seems to have become a synonym for "universal software" recently, but the current reality is:

Even based on the industry's most intelligent large models (such as GPT4.1/o4, Gemini 2.5 Pro or Claude 4, etc.), it is very difficult to complete the end-to-end process automation through a single agent in the face of complex tasks in vertical industry scenarios.

The root cause is that the real business of vertical industry scenarios cannot be solved by a single round of question and answer or a simple task of three or two steps. It is a complex system composed of multiple roles, multiple stages, multiple tools, and multiple processes.

From the perspective of the practicality of applying large models in vertical industry scenarios, Multi-Agent System (MAS) is an inevitable engineering practice for the implementation of large models in complex industry scenarios. The application of large models also needs to shift from "how to make a model smarter" to "how to build a reliable, efficient and manageable intelligent system."

Designing a production-grade multi-agent system (MAS) is a very complex engineering problem. This article attempts to use an entry-level simple multi-agent system to see how a small but comprehensive system can be designed.

1. The core value of MAS: from “single point” to “full process”

In vertical industries such as law, medical care, finance, and high-end manufacturing, which are knowledge-intensive, process-complex, and require high compliance and accuracy, a single agent system often has difficulty coping. MAS's advantages are highlighted when dealing with complex business processes involving multiple steps and interacting with numerous legacy systems.

Let’s first look at some typical vertical industry application scenarios:

Legal services : Automated processing of large-scale contract review, legal research, evidence collection and organization, compliance inspection, litigation support, etc. MAS can simulate the work of a team of lawyers, with different agents responsible for document screening, key terms extraction, risk identification, case retrieval, legal opinion drafting, etc.

Healthcare : Assisting clinical diagnosis and treatment plan formulation, personalized drug development, clinical trial management, medical image analysis, and intelligent retrieval and review generation of medical literature.

Financial services : automated credit assessment and loan approval, fraud detection and risk management, personalized investment advisory services, high-frequency trading strategy execution, and automatic generation of financial regulatory reports.

Customer Support : Handle complex customer inquiries from multiple channels through multi-agent collaboration, achieve intelligent triage, answer questions, execute operations, and seamlessly transfer to human agents when necessary.

Enterprise knowledge management and R&D : Agents are used to help employees conduct in-depth research, information discovery, and insight extraction in the vast enterprise knowledge base, supporting complex knowledge work such as product development and market analysis.

From these application scenarios, we can see that the core value of MAS is mainly reflected in several aspects:

Task decomposition and specialized processing : Complex vertical industry business processes can be decomposed into a series of smaller, more manageable subtasks. MAS can assign dedicated agents with specific knowledge, skills, and tool access to each subtask . For example, in the field of legal technology, Harvey AI uses dedicated agents to handle different tasks such as contract analysis and legal research. This specialized division of labor can improve the processing quality and efficiency of each link and avoid letting a "universal" agent take on tasks that it is not good at.
Reasoning and decision-making capabilities: Through the collaboration of multiple agents, deeper reasoning can be achieved. For example, one agent is responsible for extracting preliminary evidence from a large number of documents, another agent is responsible for logical analysis and cross-validation of these evidences, and another agent is responsible for generating decision recommendations based on the analysis results. This kind of multi-agent collaboration or pipeline reasoning is more likely to reach a robust and comprehensive conclusion than a single agent "fighting alone".
Parallel processing and efficiency improvement : Many complex business processes contain links that can be processed in parallel. MAS naturally supports assigning these links to different agents for simultaneous execution, thereby significantly shortening the overall task completion time.
Flexible integration with existing systems: Vertical industries often rely on a large number of specialized tools, databases, and legacy systems. Each Agent in MAS can be designed to interact specifically with a specific tool or API. For example, Salesforce Agentforce's Agent can call MuleSoft API to integrate with external systems, and Google Vertex AI Agent can connect to various databases and public APIs. This modular tool integration approach is more efficient and maintainable than having a single Agent master all interfaces.
Scalability and robustness : If single-task agents with atomic granularity and multi-task combinations of multi-agents can be accurately located in business scenarios, the MAS architecture is easier to expand. When business needs increase or new task types emerge, they can be achieved by adding new dedicated agents or expanding the capabilities of existing agents without large-scale reconstruction of the entire system.

2. Key Difficulties in Designing an Efficient MAS

Designing a production-grade MAS is very challenging in engineering. Even an entry-level MAS needs to consider the following key challenges:

1. Orchestration: Complexity of state management and process control

When you move from a simple serial linear process to a complex dependency graph DAG that includes parallelism, branching, loops, and retries, the complexity of the entire system increases dramatically.

How to reliably manage the execution state of the entire graph? If a subtask fails, should it be retried, abandoned, or triggered a completely new branch? Who maintains this "state machine"? If the system crashes, how to recover from the last checkpoint?

In terms of collaboration , every communication and handoff between agents will incur the overhead of data serialization, network transmission, and context loading. When the number of agents increases, this collaboration overhead may quickly offset the benefits of agent parallelism. The key to the design is how to find the best balance between "task decomposition granularity" and "agent collaboration overhead."

2. Context: Balance between long context and accuracy

Context is the “fuel” of LLM, but in MAS, the longer and more complex the context becomes, the more information interference and model accuracy may become bottlenecks.

When multiple agents operate in parallel on a shared knowledge base or external system, how can data consistency be guaranteed? One agent is updating customer information while another agent makes a decision based on old information, which is unacceptable in fields such as finance and healthcare.

3. The pain of execution: the interface problem with the non-ideal world

Agents interact with the outside world by calling tools (Tool-use), but the company's existing APIs and system interactions will also encounter a variety of problems.

How to deal with the "fragility" of APIs? How to deal with the ambiguity of parsing results of various systems and APIs? This is one of the most unstable links in the Agent execution layer.

3. Key technical solutions for building vertical industry MAS

Based on the above difficulties, let's explore a relatively pragmatic entry-level technical architecture and implementation solution.

1. Laying the foundation: MAS overall structure and core roles

The architecture of a fully functional MAS should be modular and role-based. Separating different responsibilities into different agents or services is the key to ensuring the maintainability and scalability of the system.

Table 3.1: MAS core agent roles and their key responsibilities

2. Intelligent navigation: planning, decision-making and dynamic adjustment mechanism

Planning, reasoning, and dynamic adjustment are the core responsibilities of the coordinator and the embodiment of MAS intelligence.

Planning representation : A common representation is DAG , which can clearly express complex dependencies between tasks, parallel paths, and conditional branches. Other planning representations include simple structured task chains, natural language processes based on LLM understanding, behavior trees, and even some pseudo-representative scripting.

Reasoning decision: A core principle is that reasoning strategies are mainly applied to the decision-making process of the coordinator/planner agent , or executed by a dedicated sub-agent within a strictly controlled scope . Common reasoning strategies include backtracking search, Branch and Bound/Beam Search, Sampling, and the very controversial MCTS, etc. The goal is to optimize the collaborative planning and execution efficiency of the entire MAS . Some good design principles include "centralized decision-making, decentralized execution", "clear evaluation criteria", "limited exploration scope", and "dynamic feedback injection".

Implementation of dynamic adjustment : This requires a closed loop of "perception-planning-action-evaluation-replanning", including execution monitoring, deviation detection, re-planning, such as local repair, plan adjustment and even manual intervention.

Table 3.2: Comparison of different planning representation methods in MAS

3. Information transfer: communication and context sharing between agents

How to achieve smooth information flow between agents, its efficiency and reliability directly affect the overall performance. If it is used in a production environment, I personally strongly recommend implementing an independent Context Management module.

Table 3.3: Key Context Information in MAS

4. Reliable execution: tool calling, monitoring and human-machine collaboration

This is the last mile of MAS value realization. No matter how good the decision is, it is useless if it cannot be converted into the final result through action. Because enterprises may have various systems, reliable internal and external tool calls are not easy. The key considerations during design may be:

Safe tool calls :

Tool description standardization : Use the OpenAPI specification to define API tools, and provide clear function signatures and document strings to describe other tools. This is the prerequisite for reliable Agent calls.
Security considerations : Strict authentication and authorization must be implemented at the API gateway or tool call layer. For high-risk operations (such as transactions and data deletion), a manual approval process must be introduced.

Refined exception handling :

Local error handling : should have the capabilities of automatic retry (such as exponential backoff) and graceful degradation
Global error handling : When local processing is invalid, detailed error information must be reported to the coordinator, which will decide the next course of action, such as re-planning, requesting manual intervention, etc.

Effective Human-in-the-loop (HITL) :

Set “breakpoints” at key decision points : The system should pause at these points, clearly presenting the context and recommendations of the decision to humans, awaiting confirmation or revision.
Provide an explainable "thinking process" : The agent should not only give the results, but also the basis and reasoning chain for the results. This transparency not only makes people more trusting, but also allows people to quickly determine how to help solve the problem when a problem occurs.
The feedback mechanism must be closed-loop : every human intervention and correction should be collected as high-quality labeled data for future model fine-tuning and optimization.

5. Continuous Evolution: Feedback Loop and Learning Mechanism

A MAS that cannot learn from experience is lifeless. The internal and external environment and requirements are constantly changing. A MAS that continuously iterates and evolves is a "living" system. The following points need to be considered during design and implementation:

Multi-dimensional evaluation : It is necessary to establish a three-dimensional evaluation system that includes automated indicators and manual evaluation.

Automated evaluation : monitors quantitative indicators such as task success rate, end-to-end latency, tool call error rate, etc. You can use "LLM-as-a-Judge" to evaluate the quality of output content.
Human evaluation : For complex and highly subjective tasks, human evaluation by domain experts remains the gold standard.

Feedback-based learning :

Planning strategy optimization : By analyzing historical task data, the bottleneck of the planning model can be discovered, thereby optimizing the planning strategy of MAS.
Agent model fine-tuning : The collected high-quality human-computer interaction data (especially human corrections) is valuable domain advantage data for model fine-tuning or reinforcement learning (RLVR).
Dynamic update of knowledge base : An online process must be established to add new knowledge generated in tasks back to domain knowledge after verification, and regularly clean up outdated and invalid knowledge to ensure the "freshness" of the knowledge base.

4. Key Takeaways

Finally, I would like to share four simple experiences that I have personally gained in practice.

The core of MAS is "system engineering" rather than "algorithm engineering" . This is a typical and complex distributed system design problem. The engineering challenge lies more in how to design the topology, data flow, state management, fault tolerance mechanism, authentication mechanism, context management mechanism, agent communication mechanism, etc. of the entire system.
The essence of Agent orchestration is a "persistent state machine". The soul of the main planning/orchestration agent is not the LLM used for "reflection", but the reliable "state machine" responsible for tracking, managing and restoring the entire task process. LLM provides intelligence, but the state machine provides the skeleton.
The highest value of human-machine collaboration is "generator of high-quality data" . Every effective human intervention generates valuable, high-quality labeled data for us. This data is the "fuel" for fine-tuning the model and realizing the autonomous evolution of the system in the future. Therefore, human-machine collaboration is not only a cost, but also an invisible moat accumulated over a long period of time.
What cannot be ignored is the "non-functional requirements". In vertical industries, whether a MAS can be launched and generate value is often not determined by its "intelligence" but by its "non-functional" indicators: Does its end-to-end latency meet business requirements? Can its security and compliance pass audits? Is its observability sufficient to support rapid troubleshooting? Is its operating cost within an acceptable range? These seemingly "boring" engineering issues are the real "lifeline".

Multi-agent system (MAS) is the inevitable choice for large-scale model application in vertical industry scenarios. There is no shortcut on this road. It requires rigorous system engineering thinking to solve the series of business and technical challenges mentioned above. MAS is also the foundation of "silicon-based employees" in future intelligent organizations. It is not only the main force for business value delivery, but also the driving force for future organizational change.