Woter AI detection.Hurry - ends Jun 28th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Tip Word Engineering Guide: From Basics to Practice

Written by

Audrey Miles

Updated on:June-24th-2025

The development history of prompt words

Before big language models became mainstream, the NLP field mainly adopted the "pre-training + fine-tuning" paradigm. This means that we first train a general language model with a massive corpus, and then fine-tune it for specific tasks (such as text classification and sentiment analysis). At that time, the model mainly learned to predict masked words or contextual relationships, rather than understanding and executing human instructions.

These early models were unable to "understand" natural language instructions, mainly because:

• Models are not tuned for instruction : they are trained to predict text snippets or analyze context, not to understand and perform open-ended tasks
• Lack of few-shot learning capabilities : These models require large amounts of labeled data and specialized structural fine-tuning, and cannot complete tasks with just prompt words

The turning point came in 2020, when OpenAI released GPT-3. This behemoth with 175 billion parameters demonstrated amazing capabilities: without fine-tuning, it could achieve good results on a variety of NLP tasks with only natural language descriptions and a small number of examples, marking the beginning of the era of "cue engineering."

Subsequently, through instruction fine-tuning (such as OpenAI's InstructGPT) and RLHF (reinforcement learning based on human feedback), large language models became better at understanding and following human instructions. The subsequent popularity of ChatGPT made prompt word engineering step out of the technical circle and become a popular topic.

This is a fundamental paradigm shift: from adapting the model to the task to adapting the model to the task (writing the NLP task in the form of prompts, and then letting LLMs complete the task). In the past, each NLP task might require a separate training or fine-tuning model, but now the same large language model can complete various tasks through different prompts.

Let’s use the analogy of programming vs asking questions: in the past, we used data and optimization to “program” models to perform specific tasks; now we use more prompts to “ask questions or instruct” the model to use the knowledge it has learned to give results.

This change brings three major advantages:

1. Improved portability : One LLM can cover the functions of dozens of previous models
2. Lower development threshold : No complex model training knowledge is required, and AI can be used by writing prompt words (so many so-called AI product managers and AI development engineers should be renamed as big model API product managers and big model API development engineers?)
3. Faster iteration : Modifying prompts is much faster than retraining the model

With the emergence of more versatile models such as Claude 3.5 Sonnet and GPT 4o, prompt word techniques are also evolving, and techniques such as chain thinking prompts, role-playing, and multi-round dialogue strategies are widely used.

How to write a good prompt

There is no universal formula for writing prompt words. The most effective way depends on your usage scenario. Choose the appropriate prompt word strategy according to actual needs and scenarios to achieve the best results.

Daily conversation

When you are just having a daily chat with AI, it is usually more natural and effective to use progressive prompts. You can start with simple questions, then gradually go deeper based on the AI's response, and gradually improve your requirements in multiple rounds of conversation. This method does not require carefully designed complex prompt words, but guides the AI to achieve your goals through natural conversation.

For example, if you want to get a report, you can simply ask: "Can you please write a report on climate change for me?" Then, based on the reply, add: "I want to focus on the data trends in the past five years, with a length of about 1,500 words, and the target readers are college students." This kind of communication is closer to natural human conversation and gives AI more opportunities to understand your needs.

There is no need to over-optimize prompt words in daily scenarios, just keep the conversation natural and smooth. However, the common problem when many people communicate with AI is that they don’t know how to communicate well:

The first is to assume that AI knows the context, but AI actually lacks the contextual information we think is "obvious." For example, directly saying "How do I fix this?" without providing what "this" refers to, or saying "Write me an email" without saying the content and purpose of the email. Remember, AI has no other knowledge of your situation, it only knows the information you explicitly tell it.

The second is that they can't have an effective conversation (can't clearly express their needs or provide information in a logical order). When many people communicate with AI, they are either too brief and vague, such as "write a story"; or the information is disorganized, jumping between multiple topics in one paragraph. A more effective way is to provide information in a reasonable order, stating the goal first, and then providing relevant details, just like you would explain to a person. This is AI. If it were a normal human, they would just ignore you.

The third problem is not being able to express your ideas accurately. Many people only output emotions ("I need a super marketing plan!") instead of specific ideas ("My product is XXX, and its features are XXXX. I need a social media marketing plan for users aged 18-25, emphasizing the product's XXX"). AI needs specific and clear guidance, not vague expectations, and provide enough details for AI to understand the specific context and constraints of the task.

If you start learning how to communicate with AI, you should first learn how to express your thoughts correctly. This means:

1. Clarify your goals : Before starting a conversation, think clearly about what you want to get from AI, information, creativity, analysis or suggestions?
2. Provide enough context : Give a concise but complete account of the relevant background, imagining that you are explaining your situation to someone who is smart but has no idea about it.
3. Gradually refine requirements : Start with basic requirements, then gradually add details and adjust directions based on the AI’s response. You don’t have to state all your requirements at once.
4. Give feedback : When the AI's answer does not meet expectations, point out what is wrong, rather than simply saying "no" or "rewrite". For example: "This analysis lacks consideration of the Chinese female white-collar user group. Please add this part?"

When the AI's answer is not ideal, you can directly tell the AI what needs to be improved, and in most cases it can adjust according to the feedback. This interactive process is usually simpler and more effective than trying to write the perfect prompt word from the beginning.

AI Application Development

In AI application development, prompt words are no longer just tools for interacting with models, but one of the core components of the entire application. In this environment, a structured prompt word engineering method is required to ensure that prompt words can be effectively iterated, maintained, and shared, which is crucial for product stability and team collaboration.

Clear task instructions

In many cases, when AI applications are not in the form of multi-round dialogues, a one-time accurate response is required, which requires a more rigorous "one-time accurate prompt", which is like giving a senior expert a clear task instruction. The most basic content required for this prompt is as follows:

• Explain the purpose of the task
• Provide necessary contextual information
• Demonstrate expected output through examples
• Predefine how to handle exceptions

Structural design

In application development, prompt words can be appropriately designed in a modular way, breaking down a complex prompt word into different functional modules, such as roles, contexts, constraints, examples, etc. This way, different team members can quickly learn about and understand the functions of different modules. There is no need to modify the entire prompt word every time during an iteration, which is also convenient for troubleshooting in the future.

At the same time, in many scenarios, if some contexts or constraints are similar, we should reuse the module instead of rewriting it.

Version Management

Like code, prompt words also need to be versioned. Use tools such as Git to track changes to prompt words and add meaningful commit messages for each update.

For major updates, a release process should be established, including testing, review, and deployment steps. Make sure each version has a clear identifier to facilitate quick rollback when problems occur, and to facilitate comparison of the effects and differences of different version prompts. A good practice is to use semantic version numbers (such as v1.2.3) or date version numbers (2025-03-04).

Teamwork and Communication

If you reach a consensus internally, you can consider using specialized tools or platforms to manage your prompt word library, such as some open source, hosted large model development monitoring products or internal tools. You can even consider conducting AB testing of prompt words on these platforms.

In addition, these prompts should be allowed to be viewed and communicated within the team. This is not something very confidential. Just like code, it should be allowed to be viewed and discussed at any time by product, development or other relevant people in the team. In an AI application, the meaning of prompts will be greatly reduced if they are separated from the engineering behind them.

Common prompt word patterns and their advantages and disadvantages

Here are some common prompt word patterns that most people have probably heard and used:

Role Playing Tips

Role-playing prompts guide the direction of answers by specifying that the AI plays a specific role. For example, "You are an experienced financial analyst. Please analyze the following quarterly report data and point out key trends." This approach can guide the model to adopt a specific professional perspective, increase the professionalism and depth of the answer, and help maintain a consistent tone and style.

However, over-reliance on role-playing can limit a model's full capabilities, sometimes resulting in stereotypical responses rather than truly digging into the problem. For example, Anthropic's Amanda Askell says, "I almost never use this role-playing prompting technique—even with worse models." She believes that directly and clearly stating the needs is often more effective.

Chain thinking tips

Chain-of-Thought (CoT) encourages the model to show its thinking process step by step. The typical format is "Please think about this question step by step... [question]" or "Let's analyze it step by step...". This method essentially allows the model to explicitly output the intermediate reasoning steps before generating the final answer, similar to the "draft process" when humans solve problems (of course, there is a CoD, Chain of Draft recently).

This approach is particularly effective when solving complex reasoning problems. It makes the thinking process transparent, facilitates the detection of potential errors, and improves accuracy on math and logic problems in some tests.

It should be noted here that there is a key point when using CoT: the AI must output the complete thought process instead of jumping directly to the answer. If you just add "Let's think step by step" but do not require the model to show the thinking steps, the effect is basically the same as a normal prompt.

For user-facing applications, if you want to hide the thinking part but still need an accurate answer, you can request JSON output with two separate fields for "thinking process" and "result".

Example Drive Tips

Few-Shot Prompts provide a few examples of input-output pairs and then ask new questions.

This method intuitively demonstrates the expected output through examples, reduces the reliance on complex instructions, and can effectively guide the model to understand the specific task format.

However, it also has disadvantages, such as occupying more tokens, and the example selection may inadvertently introduce bias (especially in some less intelligent models). The model sometimes pays too much attention to example features and ignores broader patterns, resulting in limited generalization ability.

Structured output prompts

Structured output prompts require the model to output results in a specific structure (such as JSON, XML, specific Markdown, etc.). For example, "Analyze the sentiment of the following text and return the results in JSON format, including two fields: 'sentiment' and 'confidence'".

The output generated by this method is easy for programs to process and parse, standardizes the answer format, and improves the consistency of results. It is particularly suitable for scenarios that require automated processing of results.

In the past, this complex structure may increase the error rate, and the model may sometimes deviate from the requested format. However, in the 2024 iteration, the APIs of most large model service providers provide the ability to output structured outputs. For this, please refer to OpenAI's official Structured Outputs document ^[1] , which is very detailed.

The prompt words project using structured output is also a little different. Usually you also need to define a JSON Schema. In this case, the prompt words are not just natural language:

{
    "name" : "get_weather" ,
    "description" : "Fetches the weather in the given location" ,
    "strict" : true ,
    "parameters" : {
        "type" : "object" ,
        "properties" : {
            "location" : {
                "type" : "string" ,
                "description" : "The location to get the weather for"
            } ,
            "unit" : {
                "type" : "string" ,
                "description" : "The unit to return the temperature in" ,
                "enum" : [ "F" , "C" ]
            }
        } ,
        "additionalProperties" : false ,
        "required" : [ "location" , "unit" ]
    }
}

In addition, there may be many ways to write prompt words, but knowing how many methods there are is not the most important thing. What is more important is that everyone should try more and evaluate more.

How to evaluate AI output

If you're going to apply a prompt to 400 different cases, people will often think about the most common cases, see that the prompt gets the right answer in those cases, and then stop thinking about it. But you actually need to find the uncommon cases and think about the ambiguities in the prompt in those cases. For example, if you have a prompt that says, "I'm going to send you some data, and I want you to extract all the rows where the name starts with G." Then you need to think about, "What if I send you this data set and there are no names that start with G? What if I don't send a data set, but an empty string?" These are the cases you have to test, because only then can you give the model more instructions and let it know what to do in these special cases. ——— Amanda Askell (Anthropic Model Alignment Expert)

Many people evaluate the effectiveness of prompt words based on subjective feelings: "This answer looks good", but in a professional environment, we need a more objective evaluation method.

However, the evaluation of the effectiveness of the prompt words should not rely solely on subjective feelings, but a systematic evaluation process should be established. Only through objective data can we determine whether the prompt words have truly achieved the expected effect.

For example, the evaluation indicators of an e-commerce customer service AI may include: accurate answer rate, problem solving speed, satisfaction score, proportion requiring manual intervention, etc. These objective indicators are more convincing than "feeling better".

Establishing an assessment framework

First of all, we need to understand that the subject of the evaluation should not be just the prompt words . Suppose you are a product planner of an AI product, in order to ensure the ultimate user experience, you need to evaluate various aspects such as models, prompt words, processes, etc.

Therefore, the evaluation methods should be diversified, including a combination of manual evaluation and automated indicators. Although manual evaluation is time-consuming, it can capture subtle differences, which is especially important for tasks with high subjectivity.

Automated indicators can provide quantifiable performance data. For example, for Chat PDF, you can measure aspects such as response relevance and completeness. For output with a strong format, for example, if you want the AI to output 20-40 words, with 40% in English, you can write scripts to evaluate and score. These objective indicators are more convincing than subjective feelings and are more convenient for internal discussion and decision-making within the team.

Specialized tools such as Promptfoo ^[2] can help with batch evaluation. Such tools allow you to define multiple test cases and evaluation metrics, and automatically compare the performance of different prompt word versions:

1. Define a test case set : Create multiple representative inputs, including common queries and edge cases
2. Set evaluation metrics : Define how to measure success, such as accuracy, relevance, or completeness
3. Configuration variants : set up multiple different versions of comparison objects (models/prompt words/processes) for comparative testing
4. Perform automated testing : The tool will send each test case through each variant to the AI
5. Analyze the results : Compare the performance of different models/prompt words/processes to find the best or most suitable solution

For example, a simple configuration using Promptfoo might look like this:

prompts:
  -  file://prompt1.txt
  -  file://prompt2.txt
providers:
  -  openai:gpt-4o-mini
  -  vertex:gemini-2.0-flash-exp
tests:
  -  vars:
      language: French
      input: Hello world
      assert:
        -  type:  contains-json
        -  type:  javascript
          value:  output.toLowerCase().includes('bonjour')
  - vars:
      language:  German
      input:  How's it going?
      assert:
        -  type:  similar
          value:  was set
          threshold: 0.6

Although this type of AI generation effect evaluation tool requires a certain amount of learning cost, it should be a must-learn part for those engaged in AI product planning.

In practical applications, an overall business-oriented scoring project can be established to quantitatively evaluate the performance of models, prompt words, processes, etc. in various dimensions:

• Content quality

• Accuracy (how well the output matches the facts)
• Relevance (how relevant the answer is to the user's question)
• Completeness (whether all aspects of the problem are covered)

• Output format

• Word count
• Output language
• Markdown or JSON or XML or HTML

• speed

• Time To First Token (TTFT): The delay from input to output of the first token
• Time Per Output Token (TPOT), single token generation time

• Cost : In commercial applications, in addition to the effect, cost control is equally important. Common optimization methods include

• Delete redundant instructions and unnecessary decorative language to reduce token consumption
• Choose the right model based on the complexity of the task, use a lightweight model to handle basic tasks first, and call a more intelligent model only when necessary
• Implement caching strategies for highly repetitive queries to avoid repeated API calls

Through these quantitative indicators, we can more objectively compare the effects of different prompt words and guide subsequent optimization. Of course, some people may ask how to score the content quality part above? Here, it is recommended to adopt a comprehensive evaluation method of humans + LLMs. Many evaluation tools currently support this. For example, let a strong AI score once, and then humans check it again. This is like the RLHF (Reinforcement Learning from Human Feedback) commonly used in training models.

Using this set of projects, you can evaluate any AI application-related content.

Continuous improvement mechanism

The optimization of AI applications should be an ongoing process rather than a one-time task, especially the optimization of prompt words.

First establish an initial performance baseline, then regularly evaluate the results, identify areas of underperformance, make targeted improvements, and finally evaluate again to verify the improvements.

Therefore, it is important to collect and analyze user feedback and provide convenient feedback channels, such as satisfaction ratings or problem reporting buttons. Actual user experience and problem reports can reveal problems that may be overlooked in the test environment. In-depth analysis of negative feedback, finding out the shortcomings of prompt words and discovering scenarios that prompt words may not cover can make the product closer to actual needs.

Bring back the tool

I think making prompts too abstract is a way to overcomplicate things. In reality, a lot of times you just need to write a very clear description of the task, rather than building complex abstractions. But having said that, you do often need to compile a series of instructions into a concrete result. So precision, as well as programming considerations such as version control and managing the history of experiments, are just as important. ——— Zack Witte (Anthropic prompt engineer)

The real positioning of the prompt word

As the prompt word project becomes more and more popular, a misunderstanding has also spread - as if mastering some "magic prompt words" can make AI produce amazing results (I think this is really ridiculous). This view has over-mythologized the prompt words.

In fact, in AI product development, prompt words are just one part of the overall process, not a decisive factor. Just like making a PDF summary and a chat product, although prompt words are important, the difference in user experience comes more from differences in engineering implementation.

• Use more advanced document parsing technology to accurately preserve tables, charts and format information
• Implement smarter chunking strategies to optimize retrieval efficiency while maintaining semantic integrity
• More efficient context management to maximize the inclusion of relevant information within token limits
• Equipped with a comprehensive fallback mechanism, the Fallback solution can be used when the recall fails
• Of course, it is also possible that newer technologies have been adopted, e.g. voyage-multimodal-3 This multimodal vector model will be more effective for retrieving PDFs that are mainly composed of images, etc.

Common Misunderstandings and the Correct Mindset

In actual development, common misunderstandings related to prompt word engineering include:

• The over-optimization trap : spending weeks fine-tuning prompt wording only to see marginal improvements
• The one-size-fits-all cue fantasy : looking for a "perfect cue" that solves all problems, rather than tailoring it to specific scenarios
• Keeping the prompt confidential : Treating the prompt as a core secret and overprotecting it instead of sharing and improving it within the team
• Ignoring evaluation : Lack of systematic evaluation under different data and scenarios leads to poor performance of prompt words in practical applications
• Ignoring user feedback : Over-reliance on internal evaluations rather than real user experience

A healthier mindset is that prompt words are just tools, not secret weapons. They are important but not core, and should be "good enough". The key is to ensure that they can work seamlessly with the system and establish an objective evaluation mechanism.

The real core competitiveness of AI applications lies in the overall product: data processing, model integration, context management, error handling, user interface and user experience design.

Conclusion

Prompt word engineering is an important part of AI application development, but it is not all. It is a tool to help us communicate effectively with AI systems, not a mysterious magic.

In the future, writing prompts will become a basic skill for all team members, but the real competitiveness will still be these traditional capabilities - identifying scenario requirements, designing overall solutions, etc.