OpenAI open source, Agent prompt word best practice tutorial ~

Written by
Silas Grey
Updated on:June-28th-2025
Recommendation

The new features of the OpenAI GPT-4.1 model and the Agent building guide will help you build a more obedient intelligent assistant.

Core content:
1. The new features of the GPT-4.1 model and the necessity of optimizing prompt words
2. The three core elements of building an agent: persistence, tool calling, and following instructions
3. Best practice examples of prompt words to improve Agent performance and controllability

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

The day before yesterday, OpenAI released GPT4.1 and also released a prompt word guide. The content mainly focuses on the characteristics of the new version of the GPT4.1 model, teaching everyone how to optimize the prompt words and thus build a better Agent .

Although GPT-4.1 has significantly improved its encoding, instruction-following and long-context capabilities over its predecessor GPT-4o, it also has a little "personality" - it has become more "obedient", strictly and even "literally" understands your instructions, and is no longer as good at "guessing the intentions" as before. This means that the Prompt that we used to use smoothly may need to be adjusted, otherwise the effect may be compromised.

Why a new guide to cue words?

In short, GPT-4.1 has two major changes:

  • Stronger capabilities : Coding, instruction following, and long-context processing capabilities have all been taken to a higher level.
  • Follow instructions more strictly : it will not guess what you want to do, it will do what you say. The advantage is high controllability, the disadvantage is that vague instructions may lead to strange results

Precisely because of this "strictness", many of the past Prompt "best practices" may need to be updated.

How to squeeze out the potential of GPT-4.1?

Three elements to build a stronger agent

The guide starts by emphasizing that GPT-4.1 is particularly suitable for building agentic workflows. OpenAI's internal agent test solved 55% of the problems on SWE-bench, which is not bad! To fully utilize its agent capabilities, three core elements must be added to the system prompt. These three elements can improve the SWE-bench evaluation by 20%~

Persistence

Tell the model that this is a multi-round task and that it needs to keep working until the problem is solved, not just give a single answer.

Official example prompt : (translated)

You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

-> 

You are an intelligent assistant - Keep working until the user's problem is fully solved before you end your turn. Stop only when you are sure the problem is solved.

Tool-calling

Encourage models to proactively use tools to query when uncertain, rather than guessing

Official example prompt : (translated)

If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to  read  files and gather the relevant information:  do  NOT guess or make up an answer.

->

If you are unsure about the contents of a file or the structure of your code, use tools to query the information: never guess or make up answers.

Planning

Guide the model to plan before each tool call and reflect after each call. This allows the model to "speak out the thinking process" and improve problem-solving ability. (4.1 is not a reasoning model) can improve swe-bench by 4%.

Official example prompt : (translated)

You MUST plan extensively before each  function  call, and reflect extensively on the outcomes of the previous  function  calls. DO NOT  do  this entire process by making  function  calls only, as this can impair your ability to solve the problem and think insightfully.

->

You must plan carefully before each function call and reflect deeply on the results after the call. Don't rely on function calls to solve the entire problem, which will weaken your thinking ability.

The correct posture for tool calling

In addition to system prompts, the guide also emphasizes the correct posture for tool calls:

  • We strongly recommend using the tools field of the API: stop manually inserting tool descriptions in the prompts! Official tests show that passing tool definitions through the API can reduce errors and make the model perform better (SWE-bench improved by 2%).
  • Give it a good name and clearly state its purpose: The tool name and description should be clear and concise, as should the parameters. This will help the model use the tool correctly.
  • Use complex tools in examples: If the tool is complex, it is best to open a "# Example" section in the system prompt to keep the description concise.

The Art of Prompt Writing

As mentioned earlier, GPT-4.1 understands instructions very literally. This requires that our instructions must be extremely clear, specific, and unambiguous . Instructions that are ambiguous or rely on the model's "understanding" will be less effective.

Recommended instruction writing process

  • Give the overall requirements first: List the basic requirements under the headings "Instructions" or "Response Rules"
  • Point-by-point details: Use subheadings to explain specific actions in detail
  • Clarify the order of steps: If you need to follow a specific process, clearly mark it with an ordered list
  • Debugging and Optimization:
    • Check if there are any contradictions between instructions (GPT-4.1 tends to listen to the latter instructions)
    • Provide clear examples that demonstrate your desired results
    • Be careful when using capital letters, exclamation marks, and other means of emphasis, as this may cause the model to over-focus on these points.

Common pitfalls and solutions:

  • "Must XXX" trap: For example, the mandatory requirement of "the tool must be called before each reply" may lead to random model calls. Solution: Add the statement "Ask the user first when the information is insufficient".
  • The "copy the examples" problem: The model may directly copy the examples you give. Solution: Clearly state "reference but not limited to these examples, and flexibly adjust according to the situation."
  • The “Too much talk” problem: Sometimes models output too much explanation or unnecessary formatting. Solution: Explicitly request brevity and specific formatting in instructions.”

The complex customer service agent example in the guide embodies these principles well: the rules are detailed, organized in layers, and supported by examples.

Play with long context and thought chain

Long context handling

  • GPT-4.1 supports contexts up to 1M tokens and is good at processing long documents
  • Capability boundary: Although the basic capabilities are strong, the performance may decline when a lot of information needs to be retrieved from massive amounts of information or when complex reasoning that requires global information is performed.
  • Best practice: Place instructions at the beginning and end of the context and repeat them once
  • You can explicitly instruct the model whether it can answer using only the information you provide or whether it can also combine its own knowledge base

Thinking Chain

Guide the model to "think" like a human, breaking down complex problems into small steps to solve them

Simple thought chaining instructions:

        ...First, think carefully step by step about what documents are needed to answer the query. Then,  print  out the TITLE and ID of each document. Then, format the IDs into a list.

->

    ...First, think carefully, step by step, about which documents you need to answer the query. Then, list the title and ID of each document. Finally, format the IDs into a list.
    

Advanced CoT : If you find that the model's thinking process is biased, you can use more specific instructions to standardize its thinking strategy. For example, the guide gives an example that requires the model to do query analysis first, then context analysis, and finally synthesis.

Other useful suggestions

Recommended Prompt Structure Template

# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps (eg, Chain of Thought instructions)
# Output Format
# Examples
## Example 1
# Context (if any)
# Final instructions and prompt to think step by step (eg, the CoT starter)

-> 

# Roles and Goals
# Instructions
## Subcategories for more detailed instructions
# Thinking steps (such as thinking chain instructions)
# Output format
# Example
## Example 1
# Context (if any)
# Final instructions and step-by-step instructions
  

Choice of delimiter

  • Prefer Markdown: titles, lists, code blocks, etc., clear and intuitive
  • XML is also good: suitable for precise packaging of content and easy nesting
  • JSON is relatively cumbersome: it has a strong structure but may require escaping in prompt words
  • Long document scenario: XML (<doc id=1 title=”...”>...</doc>) and class tables (ID: 1 | TITLE: ... | CONTENT: ...) Formatting works well, JSON performs poorly

Original guide address: https://cookbook.openai.com/examples/gpt4-1_prompting_guide