How to write function call prompt words, see the Function Calling Guide released by OpenAI (10,000 words with examples) | Latest

Written by
Jasper Cole
Updated on:June-13th-2025
Recommendation

OpenAI's latest hardcore Function Calling guide, a 10,000-word in-depth analysis, a must-read for developers.

Core content:
1. Overview and practical application of the OpenAI Function Calling guide
2. Conversion from system prompts to developer messages and instruction hierarchy
3. The importance of role settings and official example analysis
4. Guiding principles and best practices for function calling order

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Before the Dragon Boat Festival, OpenAI released the Function Calling guide for the o3/o4-mini model. This guide can be said to be the most hardcore and authoritative practical manual for large model function calls on the Internet.

If you have an agent and it happens to be connected to a large model with the OpenAI interface, then this article is definitely worth studying for you. Even if not, the core ideas and best practices are also applicable to other reasoning models that support Function Calling (such as DeepSeek R1).

Developer Tips: From System Tips to Developer Messages

Understanding the new messaging system

In the O series models, OpenAI introduced the concept of "developer messages" to explicitly tell the inference model that the instructions come from the developer. In fact, any system message you provide will be automatically converted to a developer message internally, so in actual use, the developer prompt can be understood as a traditional system prompt. This seemingly minor change actually provides a clearer instruction hierarchy for the model, allowing AI to better distinguish information from different sources.

Correspondence of other models:  It should be noted that not all models have the concept of "developer message". For example, models such as DeepSeek R1 still use the traditional system message to perform the same function. If you are using other models, you only need to put the "developer tips" mentioned in this article in the system message, and the effect is the same.

Role setting: giving the model a clear identity

Role hints become more important in Function Calling because they not only define the tone of AI's behavior, but also clarify the scope of executable operations. OpenAI provides a standard role setting example in the guide, which is worth learning and reference for every developer. A good role setting is like putting a professional "working hat" on AI, allowing it to maximize its effectiveness in a specific field.

Official role setting example:

You are an AI retail agent.

As a retail agent, you can help users cancel or modify pending orders, return or exchange delivered orders, modify their default user address, or provide information about their own profile, orders, and related products.

This example clearly lists the specific capabilities (retail agent): order management, return and exchange processing, address modification, information query, etc. This clear capability boundary allows the model to clearly know what it can do and avoid problems such as out-of-bounds operations or insufficient capabilities.

Function call order: Avoid the mistake of "building the house before laying the foundation"

Although o3/o4-mini is very smart, it may still make mistakes in the order of tool calls, just like a novice programmer may try to create a file when the directory does not exist. OpenAI provides two solutions from simple to complex, allowing developers to choose the appropriate guidance method according to actual needs.

Sequential instructions for simple scenarios:

check to see if directories exist before making files

This concise instruction is suitable for most basic scenarios and can avoid common sequence errors in one sentence.

Detailed steps of complex business processes:

To process a refund for a delivered order, follow the following steps:
1. Confirm the order was delivered. Use: `order_status_check`
2. Check the refund eligibility policy. Use: `refund_policy_check`
3. Create the refund request. Use: `refund_create`
4. Notify the user of refund status. Use: `user_notify`

This detailed sequence of steps is particularly suitable for complex business processes. Each step clearly specifies the tool to be used, allowing AI to follow the process as strictly as an experienced employee.

The boundaries of tool use: when to use and when to stop

Clearly defining when to use tools and when not to use them is key to avoiding AI "over-enthusiasm" or "passiveness". OpenAI provides a complete boundary setting example that shows how to comprehensively define the rules for tool use.

Complete tool usage boundary example:

Be proactive in using tools to accomplish the user's goal. If a task cannot be completed with a single step, keep going and use multiple tools as needed until the task is completed. Do not stop at the first failure. Try alternative steps or tool combinations until you succeed.

- Use tools when:
  - The user wants to cancel or modify an order.
  - The user wants to return or exchange a delivered product.
  - The user wants to update their address or contact details.
  - The user asks for current or personalized order or profile info.

- Do not use tools when:
  - The user asks a general question like "What's your return policy?"
  - The user asks something outside your retail role (eg, "Write a poem").

If a task is not possible due to real constraints (For example, trying to cancel an already delivered order), explain why clearly and do not call tools blindly.

The value of this example is that it not only defines specific scenarios for using and not using tools, but also emphasizes initiative and persistence - requiring AI not to give up when encountering failure, but to try alternative solutions. This "never give up until the goal is achieved" setting makes AI more like a responsible assistant.

Avoid CoT prompts

Why no additional inference prompts are needed

As a reasoning model, o3/o4-mini already has a complete thinking chain capability, and does not require developers to induce the reasoning process. In fact, requiring the reasoning model to perform additional planning and reasoning may be counterproductive, just like asking a person who is already good at thinking to learn "how to think" again. The model will automatically output the reasoning token before calling the tool. This process occurs naturally and does not require human intervention.

Applicable to other reasoning models:  This principle also applies to other reasoning models, such as DeepSeek R1 automatically outputs reasoning_content (thinking chain content), Claude's thinking mode, etc. All these reasoning models do not require you to add additional CoT prompts such as "Please think carefully" and "Analyze step by step" in the prompt.

Conditions for generating reasoning summaries

It is important to note that the reasoning process of the model is not always accompanied by a reasoning summary, because the reasoning summary needs to reach a certain number of substantial reasoning tokens before it is generated. This means that for simple tool calls, you may not see the detailed reasoning process, but this does not mean that the model is not thinking. Understanding this will help developers correctly interpret the output of the model and not assume that the model is not thinking enough just because they do not see the reasoning summary.

Function description: Tool manual

Describing the dual mission

Function descriptions have a dual mission: telling the model when to call this function and how to construct the calling parameters. OpenAI provides a standard function definition example in the guide, showing the basic structure and elements of function descriptions. A good function description should be like a precise contract that clearly specifies the interface agreement between the inference model and the tool API.

Example of a standard function definition:

tools = [{
    "type""function" ,
    "name""get_weather" ,
    "description""Get current temperature for provided coordinates in celsius." ,
    "parameters" : {
        "type""object" ,
        "properties" : {
            "latitude" : { "type""number" },
            "longitude" : { "type""number" }
        },
        "required" : [ "latitude""longitude" ],
        "additionalProperties"False
    },
    "strict"True
}]

This example shows the standard format of a function definition, where the description field "Get current temperature for provided coordinates in celsius." is the function description part we should focus on.

Using standards: embedding intelligence into descriptions

Embedding usage standards in function descriptions is an advanced technique. OpenAI provides a detailed example of the file_create function, showing how to add intelligent judgment logic to the description. This approach separates the "active" control logic from the developer prompts, allowing each tool to have its own usage wisdom.

Example of a function description with usage criteria:

Creates a new file with the specified name and contents in a target directory. This function should be used when persistent storage is needed and the file does not already exist.
- Only call this function if the target directory exists. Check first using the `directory_check` tool.  
- Do not use for temporary or one-off content—prefer direct responses for those cases.  
- Do not overwrite existing files. Always ensure the file name is unique.
- Do not overwrite existing files.  
  If replacement is intended and confirmed, use `file_delete` followed by `file_create`, or use `file_update` instead.

The beauty of this example is that it not only illustrates the basic functionality of the function, but also specifies the usage conditions, limitations, and alternatives in detail, just like equipping each tool with a little assistant to determine when to use it.

Few-sample tips: Use examples to speak

Although inference models do not rely as strongly on few-shot prompts as traditional models, examples are still valuable in constructing function parameters. OpenAI provides a complete example of a regular expression tool that shows how to use a lookup table to guide parameter construction.

A few examples of prompts for the regular expression tool:

Use this tool to run fast, exact regex searches over text files using the `ripgrep` engine.
- Always escape special regex characters: ( ) [ ] { } + * ? ^ $ | . \\
- Use `\\` to escape any of these characters when they appear in your search string.
- Do NOT perform fuzzy or semantic matches.
- Return only a valid regex pattern string.

Examples:
Literal -> Regex Pattern         
function( -> function\\(           
value[index] -> value\\[index\\]      
file.txt -> file\\.txt            
user|admin -> user\\|admin          
path\to\file -> path\\\\to\\\\file    

This intuitive comparison table allows the model to understand the parameter conversion rules at a glance, avoiding the process of trial and error, and is particularly suitable for handling complex format conversion requirements.

Key rules first: important things should be said first

OpenAI particularly emphasizes the importance of putting key rules first and provides a negative example to illustrate the problem. In the function description, the most important rules should be placed first so that the model can see the key information first.

Error description (long and with key information at the end):

Performs a fast regex-based text search that looks for exact pattern matches within files or entire directories, leveraging the ripgrep tool for high-speed scanning.
Output follows ripgrep formatting and can optionally display line numbers and matched lines.
To manage verbosity, results are limited to a maximum of 50 hits.
You can fine-tune the search by specifying inclusion or exclusion rules based on file types or path patterns.
This method is ideal when searching for literal text snippets or specific regular expressions.
It offers more accuracy than semantic methods when the goal is to locate a known string or structure.
It's generally recommended over semantic search when you're looking for a specific identifier—such as a function name, variable, or keyword—within a defined set of directories or file types.

OpenAI’s tests showed that prompts that put the key rule up front were 6% more accurate than such lengthy descriptions, a figure that would mean a significant performance difference in a production environment.

Preventing function call hallucinations

Manifestations of hallucinations

The o3 model may have function call hallucinations in some cases, such as promising to call a tool in the background but not actually executing it, or promising to call a tool in a future conversation turn. This is like an employee saying "I'll take care of this" but actually doing nothing, or saying "I'll contact the customer later" but never taking action. Although this hallucination may seem harmless, it can seriously affect user experience and system reliability.

Clear instructions: Don’t make empty promises

The first way to solve the problem of hallucinations is to explicitly prohibit this behavior in the prompts. OpenAI provides a specific instruction template. This straightforward instruction is like setting a "say it" rule for AI, letting it understand that commitment and execution must be carried out simultaneously.

Explicit instructions to guard against hallucinations:

Do NOT promise to call a function later. If a function call is required, emit it now; otherwise respond normally.

Parameter validation instructions:

Validate arguments against the format before sending the call; if you are unsure, ask for clarification instead of guessing.

These instructions are concise and clear, directly telling the AI ​​what not to do and what it should do, avoiding misunderstandings that may be caused by vague expressions.

Strict mode: using technical means to ensure accuracy

Setting the strict parameter in the function definition to true is a technical means to prevent hallucinations, which ensures that the function call strictly follows the defined pattern. For parameters with complex format requirements (such as valid Python code), OpenAI recommends adding additional validation instructions. This double insurance mechanism requires AI to pass format checks before calling functions, just like code review to ensure quality.

Keeping inference state: Responses API

Reasoning about the ongoing value of a project

The core advantage of the Responses API is the ability to keep reasoning items persistent across multiple tool calls in a single conversation. This is like allowing AI to "remember" previous thinking processes when handling complex tasks, rather than starting from scratch every time. The o3/o4-mini model is trained based on this persistent reasoning, so keeping these reasoning items during the reasoning process can significantly improve the model's intelligence and performance in tool call decisions.

State management of encrypted content

If you don't want OpenAI to handle state management, you can use the encrypted_content parameter to manage the inference state yourself. This allows you to enjoy the benefits of reasoning persistence while maintaining full control over your data. The implementation is simple: include include=["reasoning.encrypted_content"] in the API call, and then add the returned encrypted reasoning content to the context of the next request, so that the continuity of reasoning is maintained without leaking the details of the reasoning.

Actual code examples

OpenAI provides a complete code example that shows how to use the Responses API to maintain the continuity of the reasoning state. This example contains the complete process from tool definition to state management, which is an important reference in actual development.

Complete Responses API usage example:

from  openai  import  OpenAI
import  requests
import  json
client = OpenAI()

def get_weather ( latitude, longitude ):
    response = requests.get( f"https://api.open-meteo.com/v1/forecast?latitude= {latitude} &longitude= {longitude} ¤t=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m" )
    data = response.json()
    return  data[ 'current' ][ 'temperature_2m' ]

tools = [{
    "type""function" ,
    "name""get_weather" ,
    "description""Get current temperature for provided coordinates in celsius." ,
    "parameters" : {
        "type""object" ,
        "properties" : {
            "latitude" : { "type""number" },
            "longitude" : { "type""number" }
        },
        "required" : [ "latitude""longitude" ],
        "additionalProperties"False
    },
    "strict"True
}]

context = [{ "role""user""content""What's the weather like in Paris today?" }]

response = client.responses.create(
    model = "o3" ,
    input =context,
    tools=tools,
    store = False ,
    include=[ "reasoning.encrypted_content"# Encrypted chain of thought is passed back in the response
)

context += response.output  # Add the response to the context (including the encrypted chain of thought)
tool_call = response.output[ 1 ]
args = json.loads(tool_call.arguments)

result = get_weather(args[ "latitude" ], args[ "longitude" ])

context.append({                               
    "type""function_call_output" ,
    "call_id" : tool_call.call_id,
    "output"str (result)
})

response_2 = client.responses.create(
    model = "o3" ,
    input =context,
    tools=tools,
    store = False ,
    include=[ "reasoning.encrypted_content" ]
)

print (response_2.output_text)
# Output: The current temperature in Paris is about 18.8 °C.

This example demonstrates the complete transfer process of the inference state, allowing the model to make smarter decisions based on previous thinking processes, ultimately achieving a more accurate and consistent tool calling experience.

Hosting tools: Mixed use

Clear definition of tool boundaries

When you use both hosted and custom tools, it becomes critical to clearly define the boundaries of each tool's use. OpenAI provides an example of a Python tool and a calculator tool, showing how to clearly state the division of labor in the developer prompt. This clear division of labor avoids ambiguity in tool selection and improves the accuracy of calls.

Example of tool boundary definition:

You are a helpful research assistant with access to the following tools:
- python tool: for any computation involving math, statistics, or code execution
- calculator: for basic arithmetic or unit conversions when speed is preferred

Always use the python tool for anything involving logic, scripts, or multistep math. Use the calculator tool only for simple 1-step math problems.

Prioritize tools over internal knowledge

Even though o3/o4-mini models are often able to solve tasks independently, tools often provide more reliable answers. OpenAI provides an example of code_interpreter-first usage, showing how to guide the model to prioritize using tools instead of relying on internal knowledge.

Examples of guidance for prioritizing tool use:

You have access to a `code_interpreter`. Always prefer using `code_interpreter` when a user asks a question involving:
- math problems
- data analysis
-generating or executing code
- formatting or transforming structured text

Avoid doing these directly in your own response. Always use the tool instead.

This guidance allows AI to understand when it should "rely on external forces" rather than "fighting alone", ensuring the accuracy and reliability of the results.

Decision Boundaries and Backoff Mechanisms

For tool functions that may overlap, OpenAI provides a detailed example of decision boundaries and fallback mechanisms to show how to deal with overlapping tool functions.

Example of decision boundary and backoff mechanism:

Use `python` for general math, data parsing, unit conversion, or logic tasks that can be solved without external lookup—for example, computing the total cost from a list of prices.

Use `calculate_shipping_cost` when the user asks for shipping estimates, as it applies business-specific logic and access to live rate tables. Do not attempt to estimate these using the `python` tool.

When both could be used (eg, calculating a delivery fee), prefer `calculate_shipping_cost` for accuracy and policy compliance. Fall back to `python` only if the custom tool is unavailable or fails.

This hierarchical decision-making mechanism ensures optimal use of the tool, ensuring both accuracy and providing reliable fallback solutions.

MCP Tools: Model Context Protocol in Practice

Tool filtering: Avoiding load bloat

When using MCP (Model Context Protocol), filtering tools using the allowed_tools parameter is an important optimization strategy. OpenAI provides a configuration example of a Git MCP server that shows how to select only necessary tools.

MCP tool filter configuration example:

"tools" : [ 
    {
        "type" : "mcp" , 
        "server_label" : "gitmcp" , 
        "server_url" : "https://gitmcp.io/openai/tiktoken" , 
        "allowed_tools" : [ "search_tiktoken_documentation" , "fetch_tiktoken_documentation" ] ,  
        "require_approval" : "never" 
    }
]

This example shows how to select only the two tools "search_tiktoken_documentation" and "fetch_tiktoken_documentation" from the Git MCP server and ignore other irrelevant functions, saving context and improving efficiency.

Caching and latency optimization

To reduce latency, make sure to pass mcp_list_tools or include previous_response_id so that the API does not need to import the tool list repeatedly. This caching mechanism is particularly suitable for high-frequency use scenarios and can significantly improve response speed. At the same time, it is recommended to reserve the inference model for high-complexity tasks, and consider using a lighter solution for simple tool calls, which not only ensures performance but also controls costs.

FAQ

Practical Limits on the Number of Tools

While there is no hard upper limit on the number of tools for the o3 and o4-mini models, a practical guide is that configurations with fewer than 100 tools and fewer than 20 parameters per tool are considered to be within the training distribution and should perform within the expected reliability range. Even if they technically fit the training distribution, more tools may introduce ambiguity or confusion, especially when multiple tools have overlapping uses or ambiguous descriptions, and the model may call the wrong tool or hesitate.

Design Choices for Parameter Structure

Regarding whether parameter structures should be deeply nested or flat, OpenAI recommends leaning toward a flat design when in doubt. Flat structures are generally easier for models to reason about: in a flat pattern, parameter fields are top-level and immediately visible, which reduces the need for internal parsing and structuring, helping to prevent problems such as partially populated nested objects or invalid field combinations. For domains that naturally involve structured inputs (such as configuration payloads, rich search filters, or form submissions), nesting helps organize related parameters, but clear field descriptions, anyOf logic, or strict mode must be used to guard against invalid parameter combinations.

Suitability of custom tool formats

The recommendations in this guide assume that you are using the standard tools model argument to pass function patterns, as shown in the OpenAI General Guide to Function Calling. If you provide custom tool definitions via natural language in developer-written prompts (e.g., defining tools inline in developer messages or user messages), this guide may not fully apply. In this case, the model does not rely on its internal prior knowledge of tool patterns, and you may need more explicit few-shot examples, output formats, and tool selection criteria. Parameter construction reliability may also be reduced without pattern-level anchoring.