Let’s clear the fog of MCP and talk about the essence of LLM tool calling (I): Function Calling

Remove the mystery of MCP and explore the technical essence of LLM tool calling.
Core content:
1. The evolution of tool calling of Large Language Model (LLM)
2. Detailed explanation of Function Calling technology and its impact on the industry
3. How MCP protocol simplifies tool calling and promotes the development of LLM applications
origin
If we compare the large language model to a super scholar with a vast knowledge reserve, it knows everything from astronomy to geography. It can tell you the detailed steps of making Buddha Jumps Over the Wall, even down to the precise proportion of spices and fire control. But this "scholar" cannot turn on the gas stove himself, nor can he reach out and take out the casserole to cook this bowl of Buddha Jumps Over the Wall for you. This is the deep impression that the large language model (LLM) gave people when it first appeared: it can only answer questions, but cannot make any connection with the real world .
Before 2023, tool calling for large models was still in the "manual era". Developers needed to teach the model to use each tool through complex rules and codes, just like teaching a child to use chopsticks. It was not until OpenAI launched Function Calling that the industrial era of tool calling began. The MCP protocol , which came out in November 2024, made this process as simple as connecting a device with a USB port.
The emergence of MCP has made the tool calling ecosystem extremely prosperous, and individual users, developers, and enterprises can all enjoy the convenience it brings. However, in this movement-like technology boom, the layers of encapsulation on top of LLM are like a fog that is getting thicker and thicker and cannot be dissipated. It is easy for us to get lost in the appearance of encapsulation and ignore the essence of the tools used in large language models.
We may care about and argue about questions like "Is Function Call used behind MCP?", "Does Manus use MCP?", "What is the difference between MCP and Function Call?", etc. There are countless articles on this topic online, but few people will talk about "What are the factors that affect the accuracy of MCP tool calls?", "How does tool definition affect LLM reasoning?". This article attempts to discuss the technical nature of tool calls from the perspective of the technical implementation paradigms of four tool calls in large language models, as well as some of the relationships between building an MCP server and using tools.
At the bottom layer of LLM (MaaS API layer, excluding the encapsulation of various Agent building platforms and Agent frameworks), the implementation methods of tool calls can be roughly summarized into four categories:
- Function Calling
- Prompt structured output
- API structured output
- Intent recognition + predefined functions
The essence and principle of Function Calling
Overview: Flowers in the Smoke
On June 13, 2023, OpenAI officially released the first epoch-making API feature of the large language model: Function Calling. From now on, applications developed based on OpenAI's LLM API can enable LLM applications to connect to external tools. Once launched, Function Calling has become the de facto industry standard. The tool calling capabilities of other large language models, from API parameter design to model training methods, almost all borrow and are compatible with OpenAI's set of specifications, making it a well-deserved tool calling standard.
OpenAI launched its Agent platform product GPTs soon after, and integrated Function Calling and tool execution capabilities based on OpenAPI by default on the interface for creating custom Agents on the platform. This was the first time that the industry explored the complete implementation of LLM + tool calling. Ordinary non-technical users can also implement the complete tool calling process through simple configuration of the interface. The concept of Function Calling has also entered the field of vision of non-technical personnel from developers who use APIs. Over time, the concept of Function Calling has naturally been generalized and extended.
Implementation principle : the art of grafting
Huangshan Welcoming Pine
Since the advent of the big language model, no matter how the modality and API encapsulation change, the mode of human-model interaction has never changed. It has always been based on natural language interaction, and the technical term for natural language is "Prompt". It is very important to understand this, because the various encapsulation designs on the model (API, product interaction, etc.) are essentially spliced into a "Prompt" that conforms to certain text format specifications.
Function Calling is an API layer interface encapsulation on top of the LLM reasoning engine. It solves three core problems:
Tells LLM users what tools are available . Corresponds to the tools parameter in the Chat Completions API interface. What is the definition of the tool , what problem does it solve , and what input parameters are required. Corresponds to each specific tool definition in the tools parameter What behavior mode do you want the model selection tool to have (auto selection, forced selection, other). Corresponds to the tool_choice parameter
For example, let's look at the mapping of a weather query example from the API to the underlying LLM reasoning prompt:
Function Calling API
curl -X POST https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${API_KEY} " \
-d '{
"messages": [{
"role": "system",
"content": "You are an assistant who knows how to use tools. Please answer the user's questions based on the tool list provided.",
},{
"content": "What will the weather be like in Beijing tomorrow?",
"role": "user"
}],
"model": "gpt-4o",
"tools": [
{
"type": "function",
"function": {
"name": "get_chinese_weather",
"description": "Query the weather forecast for all cities in China",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City or province, such as Shanghai"
}
},
"required": [
"city"
]
}
}
},
{...other tool definitions}
],
"tool_choice":"auto"
}'
The final Transformer underlying reasoning prompt
(The actual splicing format depends on the data specifications of each model):
<system>
You are an assistant who knows how to use tools. Please answer the user's questions based on the list of tools provided.
The list of tools provided by users is as follows:
<tool-list>
<tool>
- Tool name: get_chinese_weather
- Tool Description: Query the weather forecast for all cities in China
- Parameters: { "city" : { "type" : "string" , "description" : "City or province, such as Shanghai" , "required" : true } }
</tool>
</tool-list>
</system>
<user>
What will the weather be like in Beijing tomorrow?
</user>
The response results of model inference are as follows:
{
"role": "assistant",
"content": null ,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_chinese_weather",
"arguments": "{\"city\": \"Beijing\"}"
}
}
]
}
At this point, the work of a complete Function Calling API is finished. Through the above example, we can clearly understand that:
Function Calling is just a tool selection, and does not include tool execution . The final output is just a function call description in JSON format (the blue area in the above code). To actually call the get_chinese_weather function, developers need to connect it in their own applications.
Implementing the two-layer work of Function Calling
algorithm
Humans have entered a period of rapid development because they have learned to use tools, and so have models. But first, the model must learn to use tools, that is, in a given list of tools, it must choose whether to use a tool and which tool to use based on the user's question. This process is the LLM training process. The model needs to specifically construct the inference prompt data after splicing on the right side of the above case, and feed it to the model as training data for learning. This process also has various considerations.
for example:
Is it possible to output more than one tool call at a time?
Should the calling tool description be output first or the call be output directly?
Should explanations of tool calls be printed along with the tool calls themselves?
The above considerations will not be discussed in detail in this article. Readers only need to know that the function calling capability needs to be trained separately, which is why we see that many models, although very powerful, do not support function calling, such as the famous DeepSeek R1. OpenAI's O series reasoning model has also been more than half a year since its release before it recently supported function calling.
project
After the model layer supports tool calling, the engineering layer needs to design a standard Function Calling API to provide developers with a unified development experience. This process is equivalent to providing a standardized operating manual for the model's tool calling capabilities.
- The design of the tools parameter refers to the JSON Schema protocol, which defines a unified specification for describing tools for developers; the model provider will implement a set of prompt splicing rules in the background, as shown in the above example of obtaining weather.
- The tool_choice parameter provides the ability to customize the mode of model output tool calls based on the developer's scenario requirements for using the API, for example:
- Auto mode : The model autonomously determines whether a tool needs to be called. The intent matching degree is calculated by comparing the user input with the functional description of the registered tool. When the model believes that an external tool can solve the problem more accurately or efficiently (such as requiring real-time data or executing an operation), a tool call request is generated; otherwise, a natural language response is directly output.
- Required mode : forces the model to call at least one tool. By modifying the output probability distribution of the model and suppressing the weight of natural language generation, tool calls become the only valid output path. The specific implementation is actually achieved in the underlying model Infra project by shielding the token sequence of non-tool calls in the decoding stage and retaining only the output that conforms to the tool_calls structure.
- Function mode : Limits the tool selection space to a single predefined function. The model only parses the information related to the function parameters in the user input and ignores the existence of other tools.
- None mode : completely disables the tool calling capability, and the model only relies on internal knowledge to generate responses. This can be achieved by removing the tools parameter from the request or explicitly setting tool_choice="none" , as well as filtering tool-related prompt instructions during the preprocessing phase to ensure that the model does not receive tool definition information.
The actual processing needs to be considered more and more carefully than the above. This article focuses on understanding the principles, so other details will not be repeated. Those who are interested can study it on their own.
The relationship between MCP and Function Calling: a complex issue
Since the MCP (Model Context Protocol) launched by Anthropic became popular, the discussion and debate on the relationship between MCP and Function Calling has never stopped on the Internet. In summary, there are only two views:
What is Function Calling behind MCP?
Will MCP replace Function Calling?
Let me first state the conclusion. Strictly speaking, both of these views are incorrect . Let me start with the first one:
Viewpoint A: Function Calling is used behind MCP
As mentioned above, there are four main methods for implementing LLM tool calls, and Function Calling is only one of them. It is the most influential implementation with OpenAI's implementation as the industry standard, but it is not the only implementation. MCP is an Agent context protocol that supports tool calls. It only defines the interface specification of how the MCP Server provides the list of supported tools to the MCP Client, as shown below:
// Tool list query request
{
"jsonrpc" : "2.0" ,
"id" : "unique_request_id" ,
"method" : "tools/list" ,
"params" : { } // Usually an empty object to preserve extensibility
}
// Response to tool list query
{
"jsonrpc" : "2.0" ,
"id" : "original_request_id" ,
"result" : {
"tools" : [
{
"name" : "get_weather" ,
"description" : "Query the real-time weather of the specified location" ,
"input_schema" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" , "description" : "City name, such as 'Beijing'" } ,
"unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ] }
} ,
"required" : [ "location" ]
}
} ,
{
"name" : "search_web" ,
"description" : "Internet search engine" ,
"input_schema" : {
"type" : "object" ,
"properties" : {
"query" : { "type" : "string" } ,
"max_results" : { "type" : "integer" , "minimum" : 1 }
} ,
"required" : [ "query" ]
}
}
]
}
}
The MCP protocol does not require the MCP Host to use Function Calling to implement tool calling. As a protocol designer, I will never directly bind the protocol to the implementation path of a certain function. This design pattern itself is flawed. So why do many people on the Internet hold this view? The reason is simple: the name Function Calling is too deeply rooted in people's minds. In the memory of many friends who have not studied it in depth, Function Calling is basically equated with tool calling . In addition, many media that only output What information have been widely publicized, so this phenomenon will naturally occur.
For an MCP application (MCP Host, such as Cursor, Cline) to implement MCP-based tool calls, a lot of considerations need to be made in the selection of the technical path for tool calls. This is not a simple multiple-choice question, but a set of complex engineering strategies. Let's take an example to make it clear:
As an open source IDE plug-in, Cline allows developers to customize the APIs for accessing various models. However, the problem is that not all models support the Function Calling capability, especially inference models like DeepSeek R1. Therefore, in terms of engineering implementation, the following strategy may be adopted:
- Maintain a Model Provider configuration list, including whether Function Calling is supported
- Based on the model selected by the user, determine whether Function Calling is supported. If supported, use Function Calling as the tool calling technology; if not, choose other technical paths;
- Does the model selected by the user support mandatory use of structured output? If so, use Prompt + mandatory structured output as the technical solution; if not, use pure Prompt limitation
- Use the pure prompt-limited output + regular matching technical solution to provide a backup for incorrect model output.
In summary, readers should be able to judge for themselves why this view is incorrect.
Viewpoint B: MCP will replace Function Calling
With the above understanding, we can easily judge why this view is wrong: MCP is a much larger concept than Function Calling. In addition to supporting tool query and call, it also supports the abstraction and communication standard definition of all content that can affect the model context, such as Prompt and Resources. Function Calling is just a technical implementation when MCP applications need to execute tools. More importantly, Function Calling does not involve the level of tool execution, but only provides the selection and description of tool calls. The above example illustrates this point vividly.
So why do many people have such a perception? The main reason here is that the definition of Function Calling is confused. When OpenAI launched the Function Calling API, it did not provide any practical application of Function Calling. It was not until the launch of GPTs that Function Calling technology was integrated as the default tool calling technology on the GPTs platform. In the GPTs platform, Function Calling is also coordinated with the design of the tool execution link based on the OpenAPI specification.
However, many non-technical friends are confused about the conceptual boundaries here. For the sake of convenience, they simply package Function Calling + OpenAPI-based tool execution and call it Function Calling. As the technology media spreads, people gradually cannot distinguish the definition boundaries of this term. A similar case is the relationship between MCP Host and MCP Client. The MCP protocol clearly defines what is MCP Client and what is MCP Host. However, as MCP becomes more and more familiar, it is quite troublesome to understand the difference and boundary between these two concepts. The CS architecture of the traditional Internet is deeply rooted in people's minds and is convenient for comparison and understanding, so these two concepts are confused.
Advantages and disadvantages of Function Calling in application
Function Calling, as a capability of LLM, also has the concept of availability (accuracy). Just like giving the same task and tools to a smart person and a fool, their understanding of whether to use the tool, which tool to use, and how to use the tool are completely inconsistent. This depends largely on the basic capabilities of the LLM and whether this capability has been trained in a targeted manner. Therefore, we can try to summarize the advantages and disadvantages of the implementation technology of Function Calling, so that readers can make better choices in development:
advantage
The advantages of Function Calling are mainly reflected in the following aspects:
High degree of standardization The high degree of standardization is very easy to understand. In the industry, most models that support Function Calling follow OpenAI's interface specifications. Even if there are some that do not, many open source libraries of model agents have also made them compatible. Accuracy guaranteed Manufacturers who dare to expose Function Calling APIs have all been carefully trained. If the accuracy rate is too low, they will be slapped in the face and dare not even announce new functions. Therefore, if a manufacturer has deliberately emphasized the accuracy of its Function Calling capabilities, it is recommended to try it and compare it. Mainstream cloud vendors all provide support Because of standardization, mainstream cloud vendors all provide support for Function Calling according to OpenAI's interface standards. Some vendors also encapsulate the implementation differences of Function Calling for different vendors, or provide graceful degradation. The development experience is relatively friendly
shortcoming
The main disadvantages of Function Calling are as follows:
Special training required As mentioned above, I will not go into details here. Because training is required, not all models are supported, and the effects are different, which brings additional selection and comparison costs to developers when integrating different models.
Parameter generation error requires additional processing When the model is called by the output tool, invalid parameters may be generated (such as missing fields or wrong types), requiring developers to do additional verification
Tool description is too long The JSON Schema protocol is used to describe tool definitions. As can be seen from the example above, the description of a function has hundreds of tokens. If there are dozens of connected tools, this large amount of tool descriptions needs to be passed to the model for reasoning each time. You can imagine how high the token consumption is.
Blurred responsibilities Because of the encapsulation of Function Calling, developers will subconsciously think that there is a problem with tool selection or parameter setting, and it is the fault of model capabilities. But the truth is that the name, description, parameters content in the tool definition, and the choice of tool list will affect the final effect. When using Function Calling, developers need to clearly realize that these inconspicuous tool definitions are actually the prompt itself . As mentioned earlier