Teach you step by step how to build a general large model agent (LLM Agent)

Written by
Jasper Cole
Updated on:July-09th-2025
Recommendation

A detailed guide to building a general large model agent, taking you from scratch to build an efficient intelligent system.

Core content:
1. The concept and importance of general LLM Agent
2. The development path from single LLM to Agentic system
3. Steps and considerations for building a general LLM Agent

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

High-level overview of the LLM Agent:




Why build a generic Agent? Because it is an excellent tool for prototyping your use cases and laying the foundation for designing your own custom Agent architecture.


Before we go into the details, let me briefly introduce LLM Agent. You can choose to skip this part.



What is LLM Agent?  




An LLM Agent is a program whose execution logic is controlled by its underlying model.


From standalone LLM to Agentic system:



The LLM Agent differs from approaches such as few-shot prompts or fixed workflows in that it can autonomously define and adjust the steps required to execute a user query.


Given a set of tools (such as code execution or web search), the agent can decide which tool to use, how to use it, and iterate based on the output.

This adaptability enables the system to handle a variety of different use cases with minimal configuration.



Agentic architectures exist on a continuous spectrum from the reliability of fixed workflows to the flexibility of autonomous agents.

For example, a fixed workflow like RAG (Retrieval Augmentation Generation) can be enhanced with a self-reflection loop, enabling the program to iteratively optimize when the initial response is insufficient. On the other hand, ReAct Agent can use fixed workflows as tools, providing an approach that is both flexible and structured. Ultimately, the choice of architecture depends on the specific use case and the trade-off between reliability and flexibility.


Build a generic LLM Agent from scratch!  



Step 1: Choose the right LLM


Choosing the right model is crucial to achieving the expected performance. You need to consider multiple factors, such as licensing agreements, costs, and language support .

For LLM Agent, the most important consideration is the performance of the model on key tasks such as code generation, tool invocation, and reasoning . The evaluation benchmarks include:

  • MMLU (Massive Multitask Language Understanding)
    (For reasoning ability assessment)
  • Berkeley's Function Calling Leaderboard
    (For tool selection and call evaluation)
  • HumanEval  and  BigCodeBench
    (for coding ability assessment)


Another key factor is the context window size of the model . Agentic workflows can consume a large number of tokens, sometimes even more than 100K, so a larger context window can be very convenient.


Models that can be considered (as of March 1, 2025)

  • Closed-source models: GPT-4.5, Claude 3.7
  • Open source models: Qwen 2.5, DeepSeek R1, Llama 3.2


Generally speaking, larger models tend to perform better , but a small model that can run locally is still a good choice. If you choose a small model, the agent may only be used for simpler scenes and can only connect to one or two basic tools.


Step 2: Define the Agent's control logic (i.e., communication structure)


The main difference between LLM and Agent is the system prompt . In the context of LLM, a system prompt is a set of instructions and contextual information provided to the model before it processes a user query.



The expected behavior of an agent can be encoded in the system prompts , thus defining its agentic behavior patterns. These patterns can be customized according to specific needs. Common agentic patterns are:

  • Tool Use
    The agent decides when to pass the query to the appropriate tool or to answer directly based on its own knowledge.
  • Reflection
    The agent will check and correct its own answer before responding to the user . Most LLM systems can add a reflection step.
  • Reason-then-Act (ReAct)
    The agent reasoned step by step about how to solve the query, performed an action, observed the result, and decided whether to continue taking the action or to give an answer.
  • Plan-then-Execute
    The agent first breaks down the task into multiple sub-steps (if necessary) and then executes each step step by step.


Among them, ReAct  and  Plan-then-Execute  are the most common starting points for building a general single agent .



To effectively implement these behaviors, you need to perform prompt engineering and may also need to use structured generation technology. The core idea of ​​structured generation is to guide the LLM output to conform to a specific format or pattern to ensure that the Agent's response style is consistent and in line with the expected communication method.


Example: System prompt snippet of ReAct-style Agent in Bee Agent Framework:

# Communication structureYou communicate only  in  instruction lines. The  format is "Instruction: expected output" . You must only use these instruction lines  and  must  not  enter empty lines  or  anything  else  between instruction lines. You must skip the instruction lines Function Name, Function Input  and  Function Output  if  no function calling  is  required.
Message: Use r's message. You never use this instruction line.Thought: A single-line plan of how to answer the user's message. It must be immediately followed by Final Answer.Thought: A single-line step-by-step plan of how to answer the use r's message. You can use the available functions defined above. This instruction line must be immediately followed by Function Name if one of the available functions defined above needs to be called, or by Final Answer. Do not provide the answer here.Function Name: Name of the function. This instruction line must be immediately followed by Function Input.Function Input: Function parameters. Empty object is a valid parameter.Function Output: Output of the function in JSON format.Thought: Continue your thinking process.Final Answer: Answer the user or ask for more information or clarification. It must always be preceded by Thought.
## ExamplesMessage: Can you translate "How are you" into French?Thought: The user wants to translate a text into French. I can do that.Final Answer: Comment vas-tu?

Chinese:

# Communication structureYou can only communicate via command lines. The format is: "command: expected output". You can only use these command lines, and you may not enter blank lines or anything else between command lines.If you do not need to call a function, you must skip the instruction line function name, function inputs, and function outputs.
message: A message from the user. You will never use this command line.Idea: A one-line plan of how to answer the user's message. It must be immediately followed by the final answer.Idea: A one-line step-by-step plan of how to answer a user's message. You can use the available functions defined above. If you need to call one of the available functions defined above, this instruction line must be followed by the function name, or by the final answer. Do not provide your answer here.Function Name: The name of the function. This command line must be entered immediately after the function.Function input: Function parameters. Empty objects are valid parameters.Function output: Output the function in JSON format.Thoughts: Continue with your thought process.Final Answer: Answers the user or asks for more information or clarification. It must always start with an idea.
## ExampleMessage: Can you translate "How are you" into French?Idea: User wants to translate text into French. I can do that.Final answer: Comment vas-tu?


Step 3 : Define the core instructions of the Agent


We usually think that LLM has many features out of the box, but some of them may not meet your needs. To achieve the desired performance of Agent, you need to clearly specify which features should be enabled and which should be disabled in the system prompt .

Directives that may need to be defined include:

  • Agent Name and Role
    The name of the Agent and its responsibilities.
  • Tone and Simplicity
    Should the agent be formal or casual when communicating? Should he keep it brief or provide detailed information?
  • When to use a tool
    When should you rely on external tools and when should you answer directly using LLM knowledge?
  • Error handling
    If a tool call fails, how should the Agent respond?


Example: Some instructions of Bee Agent Framework:

# InstructionsUser  can  only  see the  Final  Answer,  all  answers must be provided there.You must always use the communication structure  and  instructions defined above. Do  not  forget that Thought must be a single - line immediately followed  by Final  Answer. You must always use the communication structure  and  instructions defined above. Do  not  forget that Thought must be a single - line immediately followed  by  either  Function  Name  or Final  Answer. Functions must be used  to  retrieve factual  or  historical information  to  answer the message.If the  user  suggests  using  a  function  that  is not  available, answer that the  function is not  available. You can suggest alternatives if appropriate.   When  the message  is  unclear  or  you need more information  from  the  user , ask  in Final  Answer. 
# Your capabilitiesPrefer  to  use these capabilities  over  functions.-  You understand these languages: English, Spanish, French.-  You can translate  and  summarize, even long documents.
# Notes-  If you don 't know the answer, say that you don' t know.-  The  current time and date in  ISO format can be found  in  the  last  message.    - When  answering the  user , use friendly formats  for time and  date.   -  Use markdown syntax  for  formatting code snippets, links, JSON, tables, images, files.-  Sometimes, things don ' t go as planned. Functions may not provide useful information on the first few tries. You should always try a few different approaches before declaring the problem unsolvable.- When the function doesn't give you what you were asking  for , you must either use another  function or  a different  function  input.   - When using search  engines, you try different formulations  of  the query, possibly even  in  a different language.   -  You cannot do complex calculations, computations,  or  data manipulations  without using  functions. 

Chinese:

# Explanation The user can only see the final answer, all answers must be provided there. You must always use the communication structure and instructions defined above. Don't forget that a reflection must be one line, followed by the final answer. You must always use the communication structure and instructions defined above. Don't forget that a reflection must be one line, followed by the function name or the final answer. Functions must be used to retrieve factual or historical information to answer the message. If the user suggests using a function that is not available, answer that the function is not available. If appropriate, you can suggest alternatives. When the message is unclear or you need more information from the user, ask in the final answer. #Your abilities Use these abilities over functions. - You know these languages: English, Spanish, French. - You can translate and summarize, even long documents. # Notes - If you don't know the answer, say you don't know. - The current time and date in ISO format can be found in the last message. - When answering user questions, use a friendly time and date format. - Use markdown syntax to format code snippets, links, JSON, tables, images, and files. - Sometimes, things don't go as planned. Functions may not provide useful information in the first few attempts. You should always try several different approaches before declaring a problem unsolvable. - When a function doesn't give you what you asked for, you must use another function or another function input. - When using a search engine, you can try different formulations of your query, perhaps even in different languages. - Without using functions, you can't do complex calculations, calculations, or data manipulation.


Step 4: Define and optimize core tools


Tools give Agents powerful capabilities . With a set of carefully designed tools, you can achieve a wide range of functions . Key tools include:✅ Code execution✅ Web search✅ File reading✅ Data analysis




Each tool should include the following definition as part of the system prompt :

  • Tool Name
    Clearly describe what the tool does.
  • Tool Description
    Explain the purpose of the tool and when to use it to help the agent choose the appropriate tool.
  • Tool Input Schema
    Define input parameters, including required items, optional items, types, and constraints .
  • Tool execution mode
    How to run the tool and how the Agent should call it.


Example: Langchain community’s Arxiv tool. Below is  part of the code for the Arxiv API  implementation. This tool can be used to retrieve papers in fields such as physics, mathematics, and computer science :

class ArxivInput ( BaseModel ):     """Input for the Arxiv tool."""    query:  str  = Field(description= "search query to look up" )
class ArxivQueryRun ( BaseTool ):   # type: ignore[override, override]     """Tool that searches the Arxiv API."""    name:  str  =  "arxiv"    description:  str  = (        "A wrapper around Arxiv.org"        "Useful for when you need to answer questions about Physics, Mathematics, "        "Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "        "Electrical Engineering, and Economics"        "from scientific articles on arxiv.org. "        "Input should be a search query."    )    api_wrapper: ArxivAPIWrapper = Field(default_factory=ArxivAPIWrapper)   # type: ignore[arg- type ]    args_schema:  Type [BaseModel] = ArxivInput    def _run (         self,        query:  str ,        run_manager:  Optional [CallbackManagerForToolRun] =  None ,    ) ->  str :        """Use the Arxiv tool."""        return  self.api_wrapper.run(query)


In some cases, you may need to optimize the tool to improve performance, for example:

  • Use Prompt Engineering to adjust the tool name or description to better match.
  • Set advanced configuration and handle common errors.
  • Filter tool output to ensure the results meet expectations.


Step 5: Develop a memory management strategy


The context window of LLM is limited, which determines the amount of content that the model can "remember". For example, multi-turn conversations, long text tool outputs, and additional context information will quickly fill up the context window. Therefore, a reasonable memory management strategy is crucial .


In the context of agents, memory refers to the ability of a system to store, recall, and utilize information from past interactions. This enables agents to maintain context over time, improve their responses based on previous exchanges, and provide a more personalized experience.


Common memory management strategies

1️⃣  Sliding Memory : Keep the most recent k rounds of conversation and discard the old content. 2️⃣  Token Memory : Only keep the most recent n tokens and discard the rest. 3️⃣  Summarized Memory : After each round of conversation, use LLM  to generate a summary and then discard the specific conversation content. 4️⃣  Key Moment Storage : Let LLM identify key facts and store them in long-term memory, so that the agent can "remember" important information and provide users with a more personalized experience.



At present, we have covered the five core steps of Agent construction:


✅  Step 1: Define Agent tasks and behavior patterns (ReAct, Plan-then-Execute) ✅  Step 2: Prompt Engineering (Ensure Agent behavior meets expectations) ✅  Step 3: Write core instructions (define roles, styles, tool usage strategies, etc.) ✅  Step 4: Define and optimize core tools (such as Web search, database query, etc.) ✅  Step 5: Develop memory management strategies (prevent context overflow)



So, what happens if we now let LLM handle user queries directly?  ?



For example, this might happen:

User Message: Extract key insights from this datasetFiles: bill-of-materials.csvThought: First, I need to inspect the columns of the dataset and provide basic data statistics.Function Name: PythonFunction Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]}Function Output:
User Message: Extract information from this dataset File: bill-of-materials.csv Idea: First, I need to examine the columns of the dataset and provide basic data statistics. Function Name: Python Function Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]} Function Output:

At this point, the Agent generates raw text output. So, how do we make it perform the next step? This requires parsing and orchestration.


Step 6: Parse the Agent's raw output


A parser is a function that converts raw data into a format understandable to an application, such as an object with properties.


For the Agent we are building, the parser needs to recognize the communication structure defined in step 2 and return a structured output (such as JSON). This makes it easier for the application to process and execute the next step of the Agent.


Note : Some model providers (such as OpenAI) support parsable output by default. For other models (especially open source models), you may need to configure this feature manually.


Step 7: Orchestrate the Agent's next actions


The final step is to set up the orchestration logic that determines what LLM does after it generates the results. Depending on the output, you may need to:

  1. Execute tool calls (such as running Python code, calling APIs).

  2. Return an answer , which is to provide a final response to the user, or request additional information to further complete the task.



If a tool call is triggered, the output of the tool is sent back to the LLM (as part of its working memory). The LLM then determines what to do with this new information: perform another tool call or return an answer to the user.


Here’s what this orchestration logic looks like in code:

def orchestrator(llm_agent, llm_output, tools, user_query): """ Orchestrates the response based on LLM output and iterates if necessary. Parameters: - llm_agent (callable): The LLM agent function for processing tool outputs. - llm_output (dict): Initial output from the LLM, specifying the next action. - tools (dict): Dictionary of available tools with their execution methods. - user_query (str): The original user query. Returns: - str: The final response to the user. """ while True: action = llm_output.get("action") if action == "tool_call": # Extract tool name and parameters tool_name = llm_output.get("tool_name") tool_params = llm_output.get("tool_params", {}) if tool_name in tools: try: # Execute the tool tool_result = tools[tool_name](**tool_params) # Send tool output back to the LLM agent for further processing llm_output = llm_agent({"tool_output": tool_result}) except Exception as e: return f"Error executing tool '{tool_name}': {str(e)}" else: return f"Error: Tool '{tool_name}' not found." elif action == "return_answer": # Return the final answer to the user return llm_output.get("answer", "No answer provided.") else: return "Error: Unrecognized action type from LLM output."


That’s it!  You’ve now built a system that can handle a variety of scenarios — from competitive analysis to deep research to automating complex workflows.



What is the role of the Multi-Agent system?



As powerful as the current generation of LLMs are, they still suffer from a core limitation: they have trouble handling information overload .


If there is too much context information, or the tools used are too complex, the model may be overloaded and performance may degrade. A single general agent will sooner or later encounter this bottleneck, especially when it consumes a large number of tokens.


For some application scenarios, it may be more reasonable to adopt  a multi-agent solution . By splitting the task among multiple agents, the context that a single LLM needs to handle can be reduced, thereby improving the overall efficiency.


However, starting with a single agent is still a great starting point , especially in the prototype stage. It can help you quickly test application scenarios and find system bottlenecks. In the process, you can:

  • Understand which parts of the task the Agent actually needs to perform.
  • Identify subtasks that can be broken down into independent processes to build larger workflows.


Starting with a single agent, you can gradually obtain valuable information and lay the foundation for expanding to more complex systems in the future.