Uncovering the mystery of MCP and discussing the essence of LLM tool calls (Part 2): Prompt and API qualification

Written by
Silas Grey
Updated on:June-27th-2025
Recommendation

In-depth exploration of the flexibility and efficiency of LLM tool calls, revealing the practical skills of Prompt and API qualification methods.

Core content:
1. Introducing the working principle and advantages of Prompt structured output
2. Demonstrating how to extract structured information from model output through regular expressions
3. Analyzing the application examples of Prompt and API qualification methods in actual tool calls

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Preface


As mentioned above, at the bottom layer of LLM (MaaS API layer, excluding the encapsulation of various Agent building platforms and Agent frameworks), the implementation methods of tool calls can be roughly summarized into four categories:

❄️
  1. Function Calling
  2. Prompt structured output
  3. API structured output
  4. Intent recognition + predefined functions
The first article focuses on the principles and essence of tool calling implemented by Function Calling. Friends who have not read it can go and read it first: The fog of MCP, let’s talk about the essence of LLM tool calling (I): Function Calling" data-itemshowtype="0" target="_blank" linktype="text" data-linktype="2"> Lifting the fog of MCP, let’s talk about the essence of LLM tool calling (I): Function Calling . This article focuses on the remaining three methods.

Method 2: Prompt structured output


The second common method to implement tool calls, which is also the most flexible method, is to guide the model to output text in a specific format (such as JSON) through the prompt, and then use regular expression technology to extract structured information from the output text for subsequent tool calls. See the following example:


You are a flight inquiry assistant. Based on the user's flight inquiry needs expressed in natural language, you call tools on demand to complete the inquiry and respond.

【Tool Definition】
Tool name: flight_search
Function description: Query real-time flight information according to user needs
Input parameters:
- departure_city
(required, departure city)
-  arrival_city (required, arrival city)
- date (required, date format YYYY-MM-DD)
- passengers (optional, default is 1 adult)


}
Please return the following information in JSON format:
{
  "toolname": "flight_
search",
  "params": {
    "departure_city": "Departure City",
    "arrival_city": "Arrival City",
    "date": "YYYY-MM-DD"
  }
}

User question: Please help me check the flights from Shanghai to Guangzhou the day after tomorrow


The tool list of this method, as well as the tool call response and other behaviors are all defined in the prompt, so there is no need to rely on the support of Function Calling. Any model is applicable, so the flexibility is the best. After receiving the response from the model, the json text can be easily parsed from the output text through the third-party open source LLM Output Parser or regular expressions. The pseudo code is shown below:


import  llm_output_parser  from 'xxxx' ;
import  {flight_search}  from './tools' ;
// Parse the output text in the response
async function parseOutput ( responseText ){
    try {
        const  toolCall = llm_output_parser. parse (responseText);
        if (toolCall?. toolname ){
            return  toolCall;
        } else {
            return null ;
        }
    } catch (e) {
    
    }
}

//According to the function call in the response, execute the corresponding function
async function invokeFunction (){
    const  toolCall =  parseOutput (responseText);
    if (toolCall){
        if (toolCall. toolname  ==  'flight_search' ){
            const  params = toolCall.params ;
            if ( isValidFlightSearchParams (params)){
                return await flight_search (params);
            } else {
                // Return the verification result of unqualified parameters
            }
            
        } else  {
            //.
        }
    }
}


function isValidFlightSearchParams ( params ){
    // Check whether params meets the parameter format and required requirements
    // ...
}


advantage :

  • No need to rely on a specific model version, compatible with all models that support text generation
  • Flexible format adjustment: Supports multiple formats such as XML/JSON/YAML. The format of tool definition can be fully customized and tuned. The number of tools and list pages can be dynamically adjusted at any time.
  • The amount of tokens used can be flexibly adjusted: a more streamlined tool definition method can be selected based on actual conditions. When there are many tools, the consumption of inference tokens can be greatly reduced.

shortcoming :

  • Output stability depends entirely on the quality of the prompt and the model capabilities: different models have different instruction following capabilities. Flagship models with larger parameters can usually achieve 100% output format compliance, but for models with smaller parameters, if the prompt is not deeply tuned, sometimes format errors will occur and cannot be parsed correctly, and the accuracy cannot be reliably guaranteed.
  • Additional verification logic needs to be developed: As shown in the above code example, the text output by the model and the parsing of the tool call results require additional parsing, exception handling and other development work, which is relatively cumbersome.
  • There is no unified tool definition standard, and maintainability is poor.

Method 3: API parameter constraint method

The Pilgrimage to Mount Wutai


Similar to the path of implementing tool calls with the Prompt structured output method, there is another variant, which is to use the response_format (mandatory structured output) capability supported by some models. This is a new API feature launched by OpenAI in August 2024, which can ensure that the model is 100% output in the preset JSON format, providing certainty for developers. We call this method the API parameter constraint method, which is like putting a "format tight ring" on the model.


response_format design background

  1. 1. The structured output of Prompt-based LLM is unstable: Early developers need to use complex prompt word design or multiple requests to obtain structured data, and the model may generate invalid JSON (such as missing required fields, incorrect format, etc.), requiring additional verification and retry. 
  2. 2. Structured output is a rigid requirement for Agent application implementation: Structured data is a core requirement for API integration, data entry, and multi-step agent workflows (such as database query and dynamic UI generation). For example, e-commerce order queries must strictly match the database field format, and unstructured output will increase parsing costs. 
  3. 3. Technical iteration and upgrade: OpenAI launched the JSON mode function in 2023, but it can only generate valid JSON and cannot guarantee mode matching. This update achieves 100% mode reliability through algorithm optimization and engineering constraints. 


response_format Function Introduction

The response_format function of the OpenAI API enables precise control over the output structure through strict mode. Developers can enable this function in two ways:


  1. 1. Function call mode Add the "strict": true parameter in the tool definition and explicitly specify the required fields and additionalProperties: false in the JSON Schema to force the model to output only predefined fields. For example, an e-commerce order query tool can strictly limit the columns field to only contain enumeration values ​​such as id and status to avoid invalid fields. 
  2. 2. Directly call the response_format parameter. Directly specify the response_format type as json_schema in the API request and define the complete Schema structure. For example, in a math problem solving scenario, you can require that the output must contain the steps array (including the problem solving steps) and the final_answer field, and prohibit additional attributes to ensure that subsequent programs can parse directly. 

This feature also introduces dynamic recursion support, such as handling nested anyOf types or recursive data structures (such as tree directories), and implementing dynamic constraints for complex patterns through context-free grammars (CFG). Developers can even set independent validation rules for each object type. For example, in a database query, each element of the conditions array must contain the three elements of column , operator , and value , and the value allows a flexible combination of strings, numbers, or sub-objects.


In addition, SDK integration optimization significantly simplifies the development process:

  • Python developers can use Pydantic models to automatically generate JSON Schemas. The SDK will deserialize responses into typed objects, reducing manual parsing errors.
  • Node.js developers define schemas through the Zod library, and API responses can be directly converted into strongly typed data structures;
  • The new rejection identifier allows programmatic detection of scenarios where the model refuses to respond (such as queries involving sensitive information). Developers can quickly identify and handle exceptions through the rejected_flag field.

Tool call example


// 1. Define tool function (simulate weather query)
async  function  getCurrentWeather ( { location } ) {
return  { 
    temperatureMath . random ()* 30  +  10// simulate temperature data
    condition : [ "Sunny" , "Cloudy" , "Light Rain" ][ Math . floor ( Math . random ()* 3 )]  // Simulate weather conditions
  };
}

// 2.Configure tool call parameters
const  tools = [{
type"function" ,
function : {
    name"getCurrentWeather" ,
    description"Get the weather information of the specified city" ,
    parameters : {
      type"object" ,
      properties : {  location : {  type"string"  } },
      required : [ "location" ]
    },
    // Enable structured output
    stricttrue // Force matching parameter structure
  }
}];

// 3. Execute tool call
async  function  queryWeather ( message ) {
const  completion =  await  openai. chat . completions . create ({
    model"gpt-4o-2024-08-06"// A model that supports structured output must be used
    messages : [{  role"user"content : message }],
    tools,
    response_format : {  // Force structured response
      type"json_object" ,
      schema : {
        type"object" ,
        properties : {
          temp : {  type"number"  },
          condition : {  type"string"  }
        }
      }
    }
  });

// 4. Parse and execute tool calls
const  toolCall = completion. choices [ 0 ]. message . tool_calls [ 0 ];
return  await  getCurrentWeather ( JSON . parse ( toolCall . function . arguments ));
}

// Example call
queryWeather ( "What's the weather like in Beijing tomorrow" ).then ( console . log ) // Output example: { temperature: 25.3, condition: "Cloudy" }

 Implementation principle of response_format

The algorithm level is mainly achieved by combining the following two methods:

  • Model SFT training optimization: OpenAI's models (such as gpt-4o-2024-08-06 ) have enhanced their ability to understand complex JSON patterns through SFT fine-tuning training. Official data shows that the pattern matching rate in the benchmark test has increased from 40% of the old model to 100%.
  • Constrained Decoding: Convert JSON Schema to context-free grammar (CFG) to dynamically restrict the valid tokens for each generation step. For example, after generating {"val , the model only allows the selection of subsequent tokens that conform to the Schema (such as numbers or strings), not arbitrary characters. This is equivalent to manually intervening in the decoding stage to allow the model's output to only use tokens that conform to the Schema.


Pros and Cons

advantage

  • 100% schema reliability: Enforce output constraints through JSON Schema to avoid format errors (such as missing required fields) in traditional LLM.
  • High development efficiency: Directly define the Schema structure without complex prompts or multiple requests.
  • Cost optimization: The new model (such as gpt-4o-2024-08-06 ) reduces token costs by 50% and reduces duplicate requests caused by errors.

shortcoming

  • Schema complexity restrictions: Only supports a subset of JSON Schema (such as nesting depth ≤ 5, and an upper limit of 100 properties), and cannot handle recursive or complex union types.
  • Poor model compatibility: Very few models support this feature, and OpenAI only supports it in the new GPT-4o series. Old models need to be downgraded to JSON mode (reduced reliability).
  • First request delay: The first call of a new schema requires additional compilation time (about 10-20% delay), which can be alleviated later by caching


Method 4: Intent Recognition + Predefined Functions

Sunrise over the Sea of ​​Flowers and Bamboo


The technical path for implementing tool calls for intent recognition has been widely used before the emergence of large language models. Almost all traditional AI assistants, such as smart speakers and car voice assistants, are basically implemented based on this technical solution. For example, the early implementation of Siri was:

The natural language understanding (NLU) module maps user instructions to predefined intent labels, and then calls fixed API interfaces. For example, when a user says "play Jay Chou's songs", the system parses the intent MusicPlay and triggers the music player. This process requires manual writing of a large number of intent-action mapping rules


After the emergence of large language models, this technical path has also had many variations:

  1. 1. Move closer to the Prompt-based structured output method: inject the intent list and slot list into the Prompt, and require the model to output the most matching intent and slot value in JSON format in the instructions. 
  2. 2. Routing and diversion based on independent small-size models: This solution is usually used in situations where there are multiple different expert models or several deterministic workflows behind the application. After careful SFT, the small model can achieve very high reliability in specific scenarios. For example, general large-model assistant products usually use a small-size LLM (2B or even smaller) for intent classification after the user asks a question to identify the type of question the user asks, whether it is a time-sensitive knowledge question and answer, an encyclopedia question and answer, or information processing, logical reasoning, etc. After the classification is identified, the request is essentially routed to another deterministic process. We can also abstract these processing flow branches into a function call, but most of these functions do not require too many parameters.  


advantage

  1. 1. High reliability 
  • Traditional methods rely on manual rules, with clear logic for mapping intent and function, and stable and controllable execution results.
  • When combined with small model routing and diversion (such as using 2B small model to classify user question types), it can achieve nearly 100% accuracy in specific scenarios.
  • 2. Low resource consumption 
    • The inference cost of small models or rule engines is much lower than that of large models, and they are suitable for edge computing or high-concurrency scenarios.

    shortcoming

    1. 1. Poor flexibility 
    • All intents and function mapping rules need to be defined in advance, and complex requests that are not covered (such as multi-step reasoning tasks) cannot be processed.
    • The maintenance cost is high, and adding new functions requires redesigning the intent classification rules or training and tuning the intent recognition small model or instructions.


    Conclusion

    There is a saying in the field of software development: " There is no silver bullet in software development ." This means that no single technology or management method can completely solve the fundamental and complex problems in software development. This saying still applies here.


    Any method of implementing a large language model calling tool has its own special advantages and problems that cannot be ignored. When developing an agent to select a technical solution for tool calling, it is necessary not only to understand the advantages and disadvantages of various implementation solutions, but also to make judgments based on the characteristics of its own scenarios. More importantly, excellent software needs to provide users with certainty. When a method fails or does not work properly, is there a downgrade method to provide a backup, rather than directly reporting an error or refusing service?


    As MCP is unleashing its full potential to unify the standards for calling large language model tools, we, as Agent developers, must deeply understand that: MCP does not set standards for what technical solutions MCP Hosts need to use to implement tool calls, and the freedom and choice are entirely up to the developer. MCP Tools is just a layer of protocol layer encapsulation, and the fog brought by this layer of encapsulation is only superficial, and the real essence is prompts and structured output .