The relationship and difference between MCP and Function Calling

Written by

Iris Vance

Updated on:June-28th-2025

Question: Will too many functions defined in Function Calling cause model context explosion?

The function definition itself is counted in the model context. OpenAI clearly wrote in the official description: "Functions will be injected into the system message, so they will occupy the context and be charged as input tokens; if you encounter the context limit, you should reduce the number of functions or simplify the parameter description."

As long as the sum of all input tokens (system prompt + function list + conversation history + user questions) does not exceed the maximum context window of the selected model, it will not be triggered. context_length_exceeded Error; otherwise an error will be reported.

What will really explode is:

(1) The function description is particularly lengthy (e.g. pasting the entire OpenAPI schema field by field);

(2) The conversation history is very long and has not been truncated ;

(3) All functions are assigned to small and medium window models at one time .

Question: Different models have different Function Calling implementations, making them difficult to be compatible?

(1) Different models handle function calling in different ways, and the tool uses either the decoding branch (GPT) or the prompt convention (Claude);

(2) Each model has different system prompt instructions and format standards.

The significance of MCP for large model function calling

First, the conclusion:

(1) Traditional function calling cannot support “large-scale, multi-task, cross-modal, multi-tool, and real-time changing” agent systems;

(2) MCP does not directly improve the model context length;

(3) MCP not only includes the function calling capability, but also provides a mechanism of " on-demand loading + hierarchical calling + local caching ", unifies the access and calling capabilities of multiple resources, solves the cross-model compatibility problem, and provides the infrastructure for the intelligentization of the agent system;

(4) The model still needs to “see” the tool interface. If there is no description at all, LLM cannot generate the correct parameters.

MCP initialization process

When the MCP client and server establish a connection, the server will return the supported capabilities to the client and save them locally on the client. At the same time, when the server capabilities are updated, the client's local cache can be dynamically updated.

User interaction sequence diagram

Compared with Function Calling, in the first round of LLM dialogue, the AI application inputs all the functions descriptions to LLM in the form of prompts. When MCP interacts with LLM, it moves the detailed description of functions and parameter definitions out of the first round of prompts, which can effectively reduce the context token consumption of LLM and optimize the processing speed.

Load on demand

So when does the client give the tool-specific JSON Schema to LLM?

In the first round of interaction with LLM, the client only passes in "user query + system message (default prompt & history context & capabilities)", and LLM determines whether it needs to call a tool or access a resource. Only when it needs to call a tool or access a resource, the client will pass in the "specific tool or resource JSON Schema" to LLM, allowing LLM to generate tool or resource parameters.

Hierarchical calls

MCP provides a unified JSON-RPC specification for the "execution layer" through the separation of responsibilities between the client and the server, supporting dynamic enumeration and calling tools (tools/list → tools/call), so that the tool’s metadata can be maintained “outside the model” rather than being put into context for every turn of conversation .