How to build Agentic AI that keeps pace with the future: A comprehensive guide from basics to advanced applications

Explore the evolution and application of AI Agents and master the core knowledge of building Agentic AI.
Core content:
1. The definition and development history of AI Agents
2. The role and function of AI Agents in the context of LLM
3. Milestones and future trends of AI Agent technology
If you are in the AI field, you may have heard the term AI Agent quite frequently recently. In this article, we will take a deeper look at what we mean when we refer to Agent in the context of Large Language Models (LLM) and Artificial Intelligence (AI).
Before we dive in, one thing to remember is that the word Agent existed before today’s high-performance LLMs. We can even say that AI Agents existed long before, just not with today’s generative LLMs at their core. However, what has changed is that they have become more powerful and complex. So, in short, you hear more discussions about Agents now, not because they are brand new technology, but because they have become very, very interesting.
What is AI Agent?
At a basic level, today's AI Agents are semi-automated or fully automated systems that use LLMs as their "brains" to make key decisions and solve complex tasks. You can think of them as automated decision engines, so that you, as a user, only need to ask your queries. They will operate in an environment of available tools, using a variety of tools to complete the task for you, allowing you to sit back, relax, and wait for it to handle the problem.
The agent will autonomously direct its own processes and execution, choosing which tools to use based on the current task. These tools can include web search engines, databases, APIs, etc., allowing the agent to interact with the real world.
A brief history of AI Agents
AI agents have actually been around for a long time. You can even see in Microsoft’s recent article on AI agents [1] that the authors mentioned that they started working on AI agents as early as 2005. However, in recent years, especially thanks to the latest LLM capabilities, the form and functionality of our AI agents have changed significantly. Now, we are able to use LLM as a core component for planning, reasoning, and execution.
That being the case, I would like to highlight a few milestones in AI Agents in recent years, and you can assume that from here on we are only talking about AI Agents of today (2025). Of course, this is my personal experience looking back over the past few years. However, let's turn the clock back to before the release of ChatGPT. In 2020, there are two papers that can be regarded as the beginning of modern AI Agents that use LLM as a core decision-making component:
MRKL Systems : Pronounced “miracle systems ” , this paper focuses on the limitations of language models and why we get so many fake responses. In short, the paper points out something we now fully understand: language models don’t know everything, they are designed to generate language. In other words, we can’t expect people to know our birthday unless we tell them. This paper proposes a way to provide language models with an external knowledge base that they can query and extract relevant information. [2]
ReAct : This paper was published slightly after the MRKL system and introduced a component that is essential to today’s Agents. ReAct stands for “Reason and Act” and it proposes a clever prompting structure that allows the LLM to think about a problem, reason about a solution, choose the right tools, and put it into action. In simple terms, for example: instead of just asking a question, you tell the model what resources it has available to it and ask it to develop a plan to solve the query. In short, this paper introduces a new way of thinking that makes the LLM’s reasoning and action processes more reliable. [3]
Note: The actual ReAct prompts recommended in the paper are more complex than this, and include instructions on how to generate thoughts, how to reason, etc.
In my opinion, these two papers highlight two very important discoveries and features that have led to today’s AI agents: good instructions and external tools. Coupled with thousands of people starting to experiment with these LLMs, we are now entering a world where we are starting to build more and more sophisticated AI agents (that use more than just the ReAct prompting method).
Next, let’s take a look at the core components that make up today’s AI Agents.
Core Components of AI Agent
While not every AI agent must contain all of these components, when we build an agent, we will at least include the following components and processes: LLM, access tools (through function calls), some degree of memory and reasoning.
Let’s take a deeper look at what each of them does:
LLM : Think of the LLM as the “brain” of the entire operation. While it is not responsible for every step, when we talk about Agents in 2025, the generative model plays an important coordination role in the operation. To put it simply, going back to the example mentioned above: it is the LLM that decides to look up the user’s calendar first and then check the weather.
Tools : An important feature of agents is that they interact with their environment through different tools. These tools can be thought of as "add-ons" that make agents more efficient. These tools allow agents to go beyond the fixed training knowledge of LLMs and broaden their scope of application by providing highly relevant real-time data (such as personal databases) and capabilities (such as sending emails). Through function calls, LLMs can directly interact with a predefined set of tools, thereby expanding the scope of the agent's operation and efficiency.
Memory : Agents typically have some form of memory (both short-term and long-term) that allows them to store logs of reasoning, conversation histories, or information collected at different execution steps. We need memory to support ongoing conversations with our agents, and those we want to come back to later. Memory can be used to personalize the experience or plan for future decisions.
Observation and Reasoning : LLM is a core component of problem solving, task decomposition, planning, and path selection. It allows the agent to reason about a problem, break it down into smaller steps (if necessary), and decide how and when to use available resources/tools to provide the best solution. However, not every agent is the same, and sometimes we explicitly include reasoning as part of the process when building an agent.
An important takeaway is that there are multiple design patterns for AI agents, and these components can be used to varying degrees. The agents we see today are on a continuum, and their autonomy or “agent behavior” depends largely on how much decision-making is delegated to the LLM. In simple terms: some agents are designed to be more independent, while others rely on more external input and control.
How does AI Agent work?
Most AI agents we see today use an LLM as the core decision maker/coordinator of actions. The degree of autonomy of an LLM varies of course, and we’ll discuss this further in the “Future Outlook” section of this article. But first, let’s start with the basics and discuss how an AI agent that relies heavily on an LLM for most of its decisions works.
I've noticed that lately when people talk about LLMs and Agents, there seems to be a lot of "magic" happening behind the scenes. So, here I'll try to explain what's really going on behind the scenes of an AI Agent that has access to certain tools.
Defining Prompts
At the heart of any system that uses an LLM is an instruction (prompt) that sets the core purpose of the LLM. The ReAct paper also clearly shows how to define an agent that reasons, generates thoughts, and observes by highlighting a complex prompt. For example, we can give the LLM the instruction: "You are a helpful assistant who can access my database in order to answer my queries."
Provide tools
Next, we need to provide the LLM with a list of tools. This is one of the most common ways to create AI agents today, although it is not always necessary and we can still create systems with agent functionality without tools and function calls. Most model providers today support "function calls", which allow us to set up interactions with the LLM, listing the tools it may use at any time in order to solve a query.
When we provide a tool to the LLM, we need to tell the LLM some information. The LLM uses this information to decide when to use the tool:
Name : For example, a tool might be called technical_documentation_search.
Description : This is the most important information that the model uses at inference time to decide which tool to use. For example, for the technical_documentation_search tool, we could provide the description: "Useful when you need to search internal technical documentation for answers."
Expected input : Remember, tools are external to the LLM. The LLM knows their names, and it knows their descriptions, but ultimately, the job of a generative language model is to generate language. So what can it do? It does what it is good at! It might generate a body that returns the name of a function (the tool), and the expected input it needs to run. So when we provide a list of tools, we also need to provide this information. For example, for our technical_documentation_search tool, we can tell the LLM that it expects query: str as input.
If you’re interested in seeing how this works in practice, you can check out OpenAI’s function definition documentation:
https://platform.openai.com/docs/guides/function-calling
Use the tools
So, we have LLM, and LLM knows that it has access to some tools, knows how to run them, and what they are for. However, LLM has no inherent ability to do things like run a Python script... or search your documentation. It can, however, provide a message explaining which tool it intends to run, and tell us what inputs it expects to run that tool with.
Let's take the following scenario as an example:
We have an AI Agent that uses LLM.
We provide a technical_documentation_search tool that expects input as query: str and has a description of "useful when you need to search internal technical documentation for answers".
User asked: "Hey, how do I use Ollama with DeepSeek R1?"
In this scenario, what actually happens is:
LLM generates a response that simplifies to: "Run tool technical_documentation_search with query = "Using Ollama and DeepSeek R1"."
In effect, LLM takes our AI Agent application out of its own “world.” It instructs the system to reference an external resource .
Observe tool response
If all goes well, at this point your AI Agent has run a tool. Remember, this tool can be anything. For example, our technical_documentation_search tool might itself be a RAG (Retrieval Augmented Generation) application, which itself uses another LLM to generate responses to queries. The point is that in the end we might have run the tool with the query " Using Ollama and DeepSeek R1 ", and the response was "You can pull the DeepSeek R1 model by enabling ollama pull deepseek-r1, and run the DeepSeek R1 model through o llama run deepseek-r1 ", or something like that. But this is not the end, because the original LLM that forms the core of our AI Agent has not yet been responded to.
When a tool is run, the results of the tool are returned to the Agent's LLM. Typically, this is provided as a chat message with the role set to "Function Call". So, the LLM knows that the response it sees is not from the user, but the result of the tool it decided to run. The LLM then observes the results of the tool (or tools) and provides the final answer to the user.
congratulations!
At this point, you have learned the basics of what constitutes an AI Agent, especially one that relies on tools and function calls. I like to compare it to this: the LLM, as the core orchestrator of the AI Agent, is like a wizard with a magic book but no wand. The LLM knows what it can do and how to do it, but what it can't do is more than just say the spell. Tools still need to run outside of the LLM.
What is Agentic AI?
First of all, Agentic is an adjective.
There are a lot of new terms to get used to, which can be confusing. But actually, we can make it easier for ourselves when we talk about Agentic AI and AI Agent. AI Agent itself is a kind of Agentic AI, but AI Agent usually refers to an end application designed for a specific task. For example, an AI Agent might be a document search assistant, or a personal assistant that can access your email and WeChat.
However, when we say agentic AI, we usually mean a system that has agentic components in its design such as decision-making LLMs, reasoning steps, possibly some tools, self-reflection, etc. In order to be considered agentic, it does not need to have all of these components, but usually exhibits characteristics of some of them.
Tools for Building AI Agents
Building an AI agent requires integrating multiple components and tools, especially to create a system capable of autonomous or semi-autonomous decision making, interaction, and task execution. Although advanced agents can be very complex, even the simplest agents require some basic elements. Here are some resources that can help you get started building an AI agent:
1. Language model provider
The foundation of AI Agent is LLM, which provides the entire reasoning capability for the Agent. LLM enables the Agent to understand different inputs and plan its actions effectively. It is also very important to choose an LLM that supports built-in function calls so that we can connect it to external tools and APIs. Common LLM providers include:
OpenAI: GPT-4, o3-mini
Ali: Qwen-2.5, DeepSeek-R1, llama-3, etc.
Google: Gemini 2.0 Pro, Gemini 2.0 Flash
Mistral: Mistral Large, Mistral Small 3
Use open source models from Hugging Face or Ollama
2. Memory and Storage
Agents need some form of persistent memory to save context. Memory can be divided into two types:
Short-term memory : Used to keep track of current conversations or ongoing tasks.
Long-term memory : Used to remember past conversations, personalized information, and experiences.
Currently, there are many different implementations for both short-term and long-term memory in agents, and we may see more variations as technology advances. For example, for short-term memory, we can help the LLM manage context length limits by providing it with a "conversation summary". For long-term memory, we may choose to use a database to back up the conversation. This may change the role of vector databases like Weaviate, from which the AI agent can extract the most relevant previous conversation content to be used as long-term memory.
3. AI Agent Orchestration Framework
Orchestration frameworks are like intelligent commanders, coordinating all components of an AI agent, and even managing multiple agents in a multi-agent setting . They abstract away most of the complexity, handle errors/retry loops, and ensure that language models, external tools/APIs, and memory systems work together smoothly.
There are currently several frameworks that can simplify the development of AI agents:
Langgraph : Provides a structured framework for defining, coordinating, and executing multiple agents.
LlamaIndex : Makes it possible to create complex agent systems with varying degrees of complexity.
CrewAI : A multi-agent framework for orchestrating autonomous AI agents with specific roles, tools, and goals.
Hugging Face smolagents : A library that enables you to run powerful Agents in a few lines of code.
Haystack : An end-to-end framework that allows you to build AI applications like Agent and supports LLM.
OpenAI Swarm : An educational framework for exploring ergonomic and lightweight multi-agent orchestration.
4. Tools and APIs
The capabilities of an agent depend on the tools it can access. By connecting to various APIs and tools, an agent can interact with the environment and perform tasks such as web browsing, data retrieval, database query, data extraction and analysis, and code execution.
Frameworks like LlamaIndex provide ready-made tool integrations, such as data loaders for PDFs, websites, and databases, as well as application integrations like Slack and Google Drive. Similarly, Langchain provides a wide range of tools for agents to use. In addition, developers can build custom tools as needed and introduce entirely new functionality by wrapping the API. Recent research, such as "Querying Databases via Function Calls" [4], even hints at the potential of function calls for database querying.
In general, building an AI agent is like assembling a puzzle. You start with a good language model, add the right tools and APIs, and then add memory so that the agent remembers the important things. You can use an orchestration framework to simplify the process and tie the pieces together to ensure that each piece works perfectly together.
The Future of AI Agents: Challenges and Opportunities
One of the great things about AI Agents and Agentic AI is that it is still evolving. While we haven’t discussed here all the challenges, as well as other core components involved in building actual production AI Agents, such as observability, there are a few things that might be worth highlighting, especially about the future of AI Agents.
For example, you may have noticed that unless we take the time to deliberately design our Agentic applications, we may find that there is a lot (perhaps too much) reliance on the LLM to make the right decisions. If the Agent has access to a search tool or knowledge base, perhaps that's fine. But what if that tool has access to your bank account, and the Agent can now buy you a very expensive one-way ticket to Hawaii?
One debate I’ve been hearing recently is whether AI agents are used more as “research assistants” or as “enforcers of our intent.” This is a simple but important debate, and perhaps as LLM continues to improve and we have better norms and limits in the field of AI, our views will change.
Levels of autonomy and human involvement
Now you understand how a basic AI agent operates. But it is not necessary (or recommended) to make the LLM the coordinator of all operations. We have begun to see more and more agents delegating processes to simpler, more deterministic systems. In some cases, these processes are even delegated to humans. For example, we may see more and more scenarios that require human approval before an action can occur.
We’ve even seen tools like Gorilla implement agents with “undo” functionality, allowing humans to decide whether to roll back an action, thus adding human involvement to the process. [5 ]
Multimodal AI Agent
Multimodality refers to the ability to use more than one modality, that is, beyond language (text) to include images, video, audio, etc. The technology is largely in place for this. As a result, we will likely see more and more AI agents that are able to interact with a variety of mediums, either as part of their tools or natively if they use multimodal LLMs. Imagine an AI agent that you can ask to "Create a cute puppy video and send it to my email!"
The role of vector database
Another interesting topic is to what extent the role of vector databases in AI might expand. Currently, we mainly see vector databases as a source of knowledge that AI agents can access. However, it is easy to imagine a future where we use vector databases, as well as other types of databases, as memory resources for agent interactions.
AI Agent Examples and Application Scenarios
AI Agents are reshaping the way we work, and this change is already visible in multiple industries. AI Agents shine best when we need a perfect combination of conversation and action. By automating repetitive tasks, they not only increase work efficiency but also improve the overall user experience. Here are some examples of AI Agents in action:
1. AI Research Assistant
AI research assistants can simplify the process of analyzing large amounts of data, identifying trends, and generating hypotheses. Today, we have seen people in academia or the workplace using ChatGPT as an assistant to help them collect information, build ideas, and provide the first step in many tasks. It can be said that ChatGPT itself is a research assistant agent. These types of agents are sometimes also called Agentic RAGs , that is, AI agents can access multiple RAG tools, each of which accesses a different knowledge base.
2. AI Customer Service
AI customer service agents provide 24/7 support, handling inquiries, troubleshooting, and providing personalized interactions. They reduce wait times and allow humans to handle more complex tasks. They can serve as research assistants for customers, providing them with answers quickly, and also complete tasks for customers.
3. Marketing and Sales Agent
These agents optimize marketing campaigns and sales processes by analyzing customer data, personalizing outreach, and automating repetitive tasks such as lead qualification and email follow-ups.
4. Code Assistant Agent
These agents help developers by suggesting code, debugging errors, resolving tickets/issues, and even building new features. This allows developers to save time and focus on creative problem solving. Tools like Cursor and Copilot are examples of this.