Guide to Building Efficient Agents

A practical guide to building intelligent agents, providing professional insights and practical skills for AI projects.
Core content:
1. The difference between intelligent agents and traditional software and their application scenarios
2. Intelligent agent design principles and security strategies
3. Analysis of high-potential application scenarios and sharing of intelligent agent deployment experience
Provide AI consulting + AI project support services, reply 1 if you need
I have been engaged in Agent-related work this year, so I have formed my own set of AI project experiences. However, the most frightening thing about AI is ignorance , so I read various reports every day. Recently, OpenAI released a report called "A Practical Guide to Building Intelligent Agents" , which I found very good, so I recommend it to everyone.
The report has 32 pages in total, and its directory structure is as follows:
What is an agent? 4 When should you build an agent? 5 Fundamentals of Agent Design 7 Security 24 Conclusion 32
introduction
Large language models (LLMs) are rapidly improving in capabilities and are now capable of handling complex multi-step tasks. Breakthroughs in reasoning, multimodal processing, and tooling have given rise to a new class of LLM-driven systems: agents.
This guide is written for product and engineering teams who are trying to build an intelligent agent for the first time. It collects the key points of experience from many customer deployments and condenses them into practical best practices. It covers:
High-potential application scenario screening framework; A clear paradigm for designing agent logic and orchestration; Key practices to ensure safe, predictable, and efficient operation of intelligent agents;
After reading this guide, you will have mastered the core knowledge required to build your first intelligent agent and embark on the practical journey with ease.
What is Agent
Traditional software helps users streamline and automate their workflow.
Agents can autonomously perform the same process for users.
An intelligent agent is a system that completes tasks on behalf of its users with a high degree of autonomy .
A workflow is a series of steps that must be performed in sequence to achieve a user goal, such as resolving a customer service issue, making a restaurant reservation, submitting a code change, or generating a data report.
Non-agent scenarios : Integrating LLM into an application without letting it control process execution (such as simple chatbots, single-turn question-answering LLMs, sentiment classifiers, etc.) - these are not agents.
Therefore, to develop an intelligent agent, you first need to clearly define the Agent :
First, LLM-driven process control and decision-making
Use LLM to make decisions and control workflow execution Independently judge the task completion status Support error self-correction mechanism Abort the process and
Second, multiple tools are called and controlled by security policies
Access multiple tools to interact with external systems (obtain information/perform operations) Dynamically select tools based on workflow status Always operate within preset safety boundaries
To sum up, the core of the current Agent actually lies in two points:
Whether to rely on the model itself to generate a reliable workflow; Whether the Agent itself calls various tools to execute the end;
The reason why the model is confident enough to orchestrate its own workflow is based on the significant improvement of the model's basic capabilities.
When should you build an Agent?
Building intelligent agents means rethinking how systems handle decision making and complexity.
Unlike traditional automation, intelligent agents are particularly well suited for workflows where deterministic, rules-driven approaches fall short .
Take payment fraud analysis as an example:
Traditional rule engines act like a checklist, marking transactions according to preset conditions.
LLM agents act more like experienced investigators, integrating context, picking up subtle patterns, and identifying suspicious behavior even when no explicit rules are triggered.
This sophisticated reasoning ability is the key to enabling intelligent agents to perform well in complex and ambiguous scenarios.
PS: I have to mention here that in real practice, the rule engine is more efficient and accurate. The so-called subtle patterns here by Agent are actually rules missed by the rule engine, which logically needs to be supplemented by the rule engine.
Real applications will follow the fast and slow system, that is, the rule engine will do the first round, and the model will be the backup.
So when should we consider Agent?
When you evaluate the value of an agent, prioritize processes that are always difficult to [fully] cover with traditional automation , especially scenarios where rule-based methods have pain points:
01 Complex Decision Making | ||
02 Rules that are difficult to maintain | ||
03 High reliance on unstructured data |
Before you start building an agent, make sure your use case meets the above criteria . If the process can be solved with a simple, reliable, deterministic solution, there is no need to force an agent.
PS: In fact, the biggest problem is 100%. The model must ensure that it will not make mistakes, at least the accuracy rate is above a certain value, otherwise it is difficult for the agent to gain trust.
Three Elements of Agent Design
In its most basic form, an agent consists of three core components :
Model |
Model |
Tools |
tool |
Instructions |
instruction |
weather_agent = Agent(
name = "Weather agent" ,
instructions= "You are a helpful agent who can talk to users about the weather." ,
tools=[get_weather],
model = "gpt-4" # Specify the LLM model to be used
)
1. Model Selection Strategy
Different models differ in task complexity, latency, and cost:
Task complexity | ||
Delay | ||
cost |
As discussed in the next section, “Orchestration,” you will often need to mix and match models by task type in the same workflow .
Not all steps require the strongest model
Simple retrieval or intent classification can be accomplished with a small and fast model. More difficult decisions, such as whether to approve a refund, may require a more powerful model.
An effective approach is to first complete all steps with the most powerful model to obtain a baseline performance ; then try to replace some steps with a smaller model to see if acceptable results can still be achieved.
This will not limit the capabilities of the intelligent agent too early, but will also clearly identify the success and failure boundaries of the small model.
PS: In fact, with the low cost of large models now, you can use the strongest models.
The only problem is that there are still many private deployment scenarios that have to rely on small models, so this strategy is applicable.
The selection principle considers three points:
Establish evaluation (evals) : first use the best model to run through the entire process to form a performance benchmark. Ensure accuracy first : Consider optimization only after the target accuracy is met. Optimize cost and latency : Replace large models with smaller ones without affecting performance.
2. Definition Tools
By calling the APIs of the underlying applications or systems , tools can extend the capabilities of the agents.
For traditional systems that lack APIs , intelligent agents can use "computer operation models" to directly manipulate web pages or desktop interfaces, just like human operations.
Each tool should adopt a standardized definition to enable flexible reuse among multiple agents and form a many-to-many relationship.
Well-documented, well-tested, reusable tools improve discoverability, simplify version management, and avoid reinventing the wheel.
PS: The so-called computer-use here is far from being as mature as everyone thinks, and there is still a lot of room for optimization. For the time being, repeated RPA is relatively controllable.
Three types of tools commonly used by intelligent agents
Data | ||
Action | ||
Orchestration |
The following demonstrates how to use the OpenAI Agents SDK to weather_agent
The agent adds a set of tools (web search + result storage):
from agents import Agent, WebSearchTool, function_tool
import datetime, db # Assume that there is already a database operation module
@function_tool
def save_results (output: str) -> str:
# Write the search results to the database
db.insert({ "output" : output, "timestamp" : datetime.datetime.now()})
return "File saved"
search_agent = Agent(
name = "Search agent" ,
instructions= "Help the user search the internet and save results if asked." ,
tools=[WebSearchTool(), save_results],
)
As the number of tools required increases, it is recommended to split the task among multiple agents to work together (see the Orchestration section for details).
3. Instruction Configuration
High-quality instructions are crucial for any LLM application, and even more so for intelligent agents .
The clearer the instructions, the less ambiguity there is, and the more reliable the agent’s decisions are—making the entire workflow run more smoothly and with fewer errors.
Best Practices for Agent Instructions
Leverage existing documents | |
Prompt the agent to disassemble the task | |
Define well-defined actions | |
Covering edge cases |
Automatically generate instructions using high-level models
You can let high-performance models such as o1 and o3‑mini generate specification instructions directly from existing documents.
The following English prompt word example demonstrates this idea:
You are an expert at writing instructions for LLM agents.
Please convert the following Help Center document into a clear instruction list, using a numbered list format.
This document is a policy for LLMs to follow.
Make sure there is no ambiguity and write it in a way that the agent can directly execute the instructions.
The help center documents to be converted are as follows: {{help_center_doc}}
4. Arrangement
Once the basic components are in place, you can choose the appropriate orchestration mode to enable the agent to efficiently execute the workflow.
While it is tempting to jump right in and develop a complex, fully autonomous agent, practice shows that a step-by-step, iterative approach is often more likely to be successful.
There are two main categories of orchestration patterns:
Single-agent system . A model equipped with the necessary tools and instructions to execute the entire workflow in a loop. Multi-agent system : Split the workflow and assign it to multiple agents to work together, each performing its own duties.
Next, we will expand on these two modes one by one.
Single-agent system
Initially, a single agent only needs the most basic model and one or two tools to run; as needs increase, new tools are gradually "equipped" for it.
Doing so will allow functionality to grow naturally as the project iterates, without introducing additional orchestration costs due to premature splitting into multiple agents.
Its core components are:
Any orchestration scheme relies on a concept of “ run ” — usually implemented as a loop that keeps the agent working until an exit condition is met. Common exit conditions include:
Required tool calls completed The specified structured output is generated An error occurred Reached the maximum number of rounds
For example, in the Agents SDK, the agent is started by this method, which loops and calls LLM until one of the following occurs: Runner.run()
The final‑output tool defined by the specific output type is called The model returned a response without any tool calls (such as a direct message to the user)
Example usage:
Agents.run(agent, [UserMessage()]) # "What's the capital of the USA?"
This concept of while loop is the core of the agent operation mechanism.
In multi-agent systems (as we will see later), a series of tool calls and handoffs between agents can occur, while still allowing the model to execute multiple steps in succession before an exit condition is met.
An effective strategy for managing complexity without switching to a multi-agent framework is to use prompt templates.
Rather than maintaining a large number of separate prompts for different use cases, use a flexible base prompt and inject policy variables.
This template approach can be easily adapted to various scenarios, greatly simplifying maintenance and evaluation. When new use cases emerge, only the variables need to be updated without rewriting the entire workflow:
You are a call center agent.
You are communicating with {{user_first_name}}, who is already a member {{user_tenure}}.
The most common complaint category for this user is {{user_complaint_categories}}.
Please greet your users, thank them for their continued loyal support, and answer any questions they may have!
So, here comes the question: when should we consider creating multiple Agents?
Our overall recommendation is to prioritize fully exploiting the capabilities of a single agent.
Multiple agents allow for a conceptually intuitive division of labor, but they also introduce additional complexity and overhead; in many scenarios, a single agent with the right tools is sufficient.
For complex workflows , splitting prompts and tools across multiple agents often improves performance and scalability.
If your agents have difficulty following complex instructions or frequently choose the wrong tools, you may need to further subdivide your system into more independent agents.
A practical guide to splitting agents
Complex Logic | if‑then‑else branches), and the prompt template is difficult to extend, consider assigning each logical fragment to a different agent. |
Tool Overload |
Next, we introduce the multi-agent system.
Multi-Agent System
Although multi-agent systems can be designed in a variety of forms according to specific workflows and requirements, our customer practices show that there are two universal patterns :
First, manager mode (Manager, agents as tools)
A centralized “manager” agent coordinates multiple specialized agents through tool calls, each of which is responsible for a specific task or domain.
Second, decentralized model (Decentralized, agents handing off to agents)
Multiple agents run as peers, handing over tasks to each other based on their respective expertise.
The multi-agent system can be abstracted into a graph structure: nodes represent agents:
In the manager model , a centralized "manager" agent coordinates multiple specialized agents through tool calls; each agent is only responsible for the tasks or areas in which it is good . In the decentralized model , multiple agents collaborate as peers and hand off tasks to the most suitable agent for further processing based on their respective expertise .
Regardless of the orchestration pattern used, the core principles remain the same: keep components flexible, composable , and driven by clear, structured cues .
1. Manager Mode
The so-called manager mode is very similar to DeepSeek's MoE architecture. The Manager mode gives a centralized large language model (LLM) the ability to "manage" and enable it to seamlessly orchestrate a network of specialized agents through tool calls .
Rather than losing context or control of the process, managers are able to intelligently dispatch tasks to the right agent at the right time and effortlessly integrate the outputs of the agents into a coherent interaction .
In this way, users can get a smooth and unified user experience, and various professional capabilities can be called upon on demand at any time .
The applicable scenario is: when you only want a single agent to control the execution of the entire workflow, and the agent needs to interact directly with the user, the Manager mode is the ideal choice.
For example, to implement the Manager pattern in the Agents SDK:
from agents import Agent, Runner # Example import
# -------- Define three dedicated translation agents --------
spanish_agent = Agent(
name= "translate_to_spanish" ,
instructions= "Translate the user's message to Spanish"
)
french_agent = Agent(
name = "translate_to_french" ,
instructions= "Translate the user's message to French"
)
italian_agent = Agent(
name= "translate_to_italian" ,
instructions= "Translate the user's message to Italian"
)
# -------- Define manager agent --------
manager_agent = Agent(
name = "manager_agent" ,
instructions=(
"You are a translation agent. You use the tools given to you to translate. "
"If asked for multiple translations, you call the relevant tools."
),
tools=[
spanish_agent.as_tool(
tool_name= "translate_to_spanish" ,
tool_description= "Translate the user's message to Spanish" ,
),
french_agent.as_tool(
tool_name= "translate_to_french" ,
tool_description= "Translate the user's message to French" ,
),
italian_agent.as_tool(
tool_name= "translate_to_italian" ,
tool_description= "Translate the user's message to Italian" ,
),
],
)
# -------- Run Example --------
async def main () :
msg = input( "Please enter the text to be translated: " )
orchestrator_output = await Runner.run(
manager_agent, msg
)
print( "Translation step: " )
for message in orchestrator_output.new_messages:
print( f" - {message.content} " )
# Calling example:
# Input: Translate 'hello' to Spanish, French and Italian for me!
Declarative vs. non-declarative graphs
Declarative frameworks . Some frameworks require developers to explicitly define every branch, loop, and condition in the workflow in advance in a graphical manner (nodes = agents; edges = deterministic or dynamic connections).
Advantages: Clear visualization. Disadvantages: When the workflow is more dynamic and complex, this approach will quickly become cumbersome and may even require learning a specialized domain language (DSL).
A non-declarative, code-first approach allows developers to express workflow logic directly using familiar programming structures without having to draw a complete diagram in advance.
Advantages: More flexible and adaptable, agents can be dynamically orchestrated based on runtime requirements.
Many students may not understand this, so let me give a brief explanation. The so-called declarative structure is like drawing a flowchart. All steps and routes need to be defined in advance, such as the bank account opening automation process:
The advantage is clear: the process is stable , but the disadvantage is also obvious. It is very troublesome to adjust the process in the responsible logic, such as: modifying the entire flowchart or redefining all connection relationships .
Rather than declarative, that is, code first , in this case, just change a few lines of code...
To put it in plain words: declarative style is to use buttons and dify to drag and drop; code-first style is to have an engineering team to write the code .
You have to tell the system | if / for / await Decide next step on the spot |
|
Common forms | ||
Advantages | ||
Disadvantages | ||
Typical scenarios |
2. Decentralized Model
In a decentralized model, agents can "handoff" workflow execution rights to each other.
Handover is a one-way transfer mechanism that allows one agent to delegate a task to another.
In the Agents SDK, handover is designed as a tool or function type. When an agent calls the handover function, the system immediately starts the execution process of the target agent and synchronously transfers the latest session state.
Its core features are:
Equal collaboration : This model relies on multiple agents working together on an equal footing Direct control transfer : One agent can directly transfer control of a workflow to another agent No central dispatch required : Applicable to scenarios where a single agent is not required to maintain centralized control or comprehensive processing Dynamic Interaction : Each agent can take over the execution flow and interact directly with the user as needed
To sum up: this mode can achieve optimal performance when the workflow does not require a central controller for global coordination, but is more suitable for autonomous processing by different agents in stages.
The following shows how to use Agents to implement a decentralized workflow that handles both sales and after-sales support.
The core idea is that the Triage Agent first diverts the conversation and then hands it over to the most suitable dedicated agent:
from agents import Agent, Runner
# ────────────────────── Professional Agent──────────────────────
technical_support_agent = Agent(
name= "Technical Support Agent" ,
instructions=(
"You provide expert assistance with resolving technical issues, "
"system outages, or product troubleshooting."
),
tools=[search_knowledge_base] # ※ Search the knowledge base
)
sales_assistant_agent = Agent(
name= "Sales Assistant Agent" ,
instructions=(
"You help enterprise clients browse the product catalog, "
"recommend suitable solutions, and facilitate purchase transactions."
),
tools=[initiate_purchase_order] # ※ Generate a purchase order
)
order_management_agent = Agent(
name= "Order Management Agent" ,
instructions=(
"You assist clients with inquiries regarding order tracking, "
"delivery schedules, and processing refunds."
),
tools=[track_order_status, # ※ Track order status
initiate_refund_process] # ※ Initiate refund process
)
# ────────────────────── Diversion Agent────────────────────────
triage_agent = Agent(
name= "Triage Agent" ,
instructions=(
"You act as the first point of contact, assessing customer "
"queries and directing them promptly to the correct specialized agent."
),
handoffs=[technical_support_agent,
sales_assistant_agent,
order_management_agent] # Objects that can be transferred
)
# ──────────────────────── Run example──────────────────────
Runner.run(
triage_agent,
[
"Could you please provide an update on the delivery timeline "
"for our recent purchase?"
]
)
Process description:
Initial Message → Triage Agent . The user first sends a query to the triage_agent. Intelligent Handoff . triage_agent recognizes that the problem is related to "order delivery time", so it calls handoff and hands over control and session state to order_management_agent. After order_management_agent takes over , it uses its own tools (such as track_order_status) to query and respond to the latest logistics progress. Optional handoff . If you need to return to the main process after completing the task, you can trigger handoff again in order_management_agent to hand over control to triage_agent or other agents to form a closed loop.
Decentralized division of labor allows each agent to focus on its own field, reducing the pressure on the master controller and improving professionalism, which is particularly suitable for conversation diversion scenarios.
Questions and Answers
Many students may not understand this, so I will give a brief explanation here:
The decentralized model is like a group of colleagues of the same level working at an open workstation - whoever is best at the job can step in first, and after finishing the work, they can directly hand the documents on the table to the next more suitable colleague to continue.
There is no "team leader" keeping an eye on things, nor is there a fixed flow chart. Everyone just "passes" the work to the most suitable person as they go along.
What is the essential difference from the "manager mode"?
Manager mode is an all-round assistant . Users always face the same virtual customer service image. The operating logic is as follows:
User → Manager Agent → Call Tool → Professional Agent → Return Results → Manager Agent Integration → Reply to User
User question: "Please help me check the logistics of order 1234 and recommend similar products" Manager Agent receives the request The background calls two tools at the same time:
Tool A: Order Query Agent → Get Logistics Information Tool B: Product Recommendation Agent → Generate Recommendation List
The advantages here are clear:
Unified experience: Users feel like they are always talking to the same person Hidden collaboration: Users do not need to be aware of the existence of multiple agents in the background Strong controllability: suitable for scenarios that require auditing/filtering of sensitive information (such as financial consulting)
The decentralized model is similar to a department relay , and users will feel the switching of service providers:
User → Triage Agent → Transfer → After-sales Agent → Transfer → Sales Agent → ... → Final closed loop
User question: "How can I get a warranty for a broken phone screen? And take a look at new models" The triage agent identifies dual demands → triggers the transfer rule First: The maintenance customer service agent takes over the conversation: "Please provide the device IMEI code, and I will generate a maintenance work order for you..." After the maintenance problem is solved, it will be automatically triggered: "Detected that you are interested in new products, transferring you to a product consultant..." Second leg: Sales Agent shows new products and guides purchases
The product experience here will be different:
In-depth service: The most professional agents provide the ultimate service in every link Flexible jump: similar to the experience of "triage desk → specialist → examination department" in a hospital Reduce complexity: A single agent only needs to be proficient in a specific area (e.g. a maintenance agent does not need to understand sales strategies)
The logic here is very similar to my previous training PPT:
In a field, it is better to adopt a manager model, but if you jump from the legal field to the medical field, decentralization is more appropriate.
Agent Security
Well-designed safeguards can help you manage data privacy risks (e.g., preventing system prompts from being leaked) and reputation risks (e.g., ensuring that model behavior is consistent with brand tone):
Deploy in layers . Set up protections for identified risks first, and then layer on additional protections as new vulnerabilities are discovered. Works with security infrastructure . Protection is a critical component in any LLM-based deployment, but must be used in conjunction with strong authentication and authorization protocols, strict access controls, and other standard software security mechanisms. Think of protection as “layered defense” — a single line of defense is often not enough to provide comprehensive protection, and a combination of multiple, specialized lines of defense can make the intelligent body more resilient.
The following diagram (omitted here) demonstrates how to combine LLM-level protection measures, rule-based protection measures (such as regex), and the OpenAI Moderation API to perform multiple checks on user input:
Type of protection measures
Relevance classifier |
||
Safety classifier |
||
PII filterPersonal sensitive information filter |
||
Moderation Content Review |
||
Tool safeguards |
||
Rules‑ based protections |
DROP TABLE Suspicious input |
|
Output validation |
Three-step heuristic for building protection measures :
Focus on data privacy and content security: Prioritize addressing the most important privacy and security risks. Iteration based on real edge cases: As new issues are exposed during actual use, corresponding layers of protection are added. Balancing safety and experience: Continuously fine-tune protection measures during the evolution of the intelligent body to ensure both safety and a smooth user experience.
Specifically, Guardrails can be implemented as a function or agent to enforce the following policies:
Jailbreak protection | ||
Correlation Verification | ||
Keyword filtering | ||
Security Classification |
Humanity's back
Human involvement is a critical safety net that improves agent performance in real-world environments without sacrificing user experience .
This is especially important early in a deployment to help identify failures, uncover edge cases, and establish a robust evaluation loop.
Implement human intervention mechanisms to gracefully hand over control when the agent is unable to complete the task :
Customer service scenario : Escalate the issue to manual customer service. Coding scenario : Return control to the user.
Typical trigger conditions :
Exceeding failure thresholds — Exceeding failure thresholds
Set limits on the number of retries or actions that agents can take; if exceeded (e.g., multiple failures to understand customer intent), escalate to manual processing.
High‑risk actions — High‑risk actions
For sensitive, irreversible, or high-value operations, human oversight should be introduced before fully trusting the agent’s reliability. Examples: canceling a user order, approving a large refund, executing a payment.
Conclusion
Agents are ushering in a new era of workflow automation—systems that can reason in uncertain scenarios, perform operations across tools, and handle multi-step tasks with a high degree of autonomy.
Unlike simpler LLM applications, intelligent agents can execute complete processes end-to-end, making them particularly suitable for scenarios involving complex decisions, unstructured data , or brittle rule-based systems .
PS: The so-called end-to-end means that in the same system or process, all steps from the initial starting point of the input (the starting end) to the final usable result or action (the end end) are completed by the system at one time and automatically - there is no need to hand over the task to other independent systems or manual relays in the middle.
The foundation for building a reliable agent
Powerful model × Well-defined tools × Clear, structured instructions
Choose an orchestration mode that matches your complexity:
Start with a single agent Only when necessary, gradually evolve to a multi-agent system
Add guardrails at every stage :
Input Filtering Tool Usage Restrictions Human-in-the-loop
This ensures that the agent operates securely and predictably in production environments .
Gradually implement and continuously iterate
Start with a minimum viable product ( MVP ) and validate it with real users. Steady expansion : Continuously improve in practice and gradually enhance capabilities.
With a solid foundation and iterative approach, intelligent agents can not only automate individual tasks, but also drive entire workflows with intelligence and adaptability to create real value for the business.
This concludes the study of the report. The entire report still contains a certain amount of information. It may be a bit difficult for students who are not familiar with Agent development to read, but it is still recommended to read it.