Advanced Prompt Engineering

Master advanced prompt engineering and optimize the output quality of AI models.
Core content:
1. Definition and importance of prompt engineering
2. Key factors affecting model output: output length, sampling control
3. Comprehensive application skills of temperature, Top-K, and Top-P
Generally speaking, prompts that are used to communicate directly with a chatbot are not advanced prompts. However, more direct and refined control model output through APIs and other methods is an advanced prompt project.
1. What is prompt engineering?
Prompt engineering can be broken down into two words, "prompt" and "engineering". Prompt refers to guiding the model to output the correct token sequence. Engineering involves the process of design, evaluation, optimization, and debugging.
Prompt engineering refers to the process of designing high-quality prompts to guide the model to produce accurate output, but this process involves repeated debugging to find the best prompts.
For different models, understanding the model's capabilities requires targeted design of prompt words to obtain the best prompts.
2. In addition to the prompt words, the LLM output configuration also affects the model output
2.1 Output length
The output length is a limit on the number of tokens generated by the LLM response. It should be noted that limiting it only means stopping large models from making predictions, but it does not make the model output more concise. If you want the output to be more concise, you still need to limit it through the prompt itself.
2.2 Sampling controls
The basic principle of the large model is the prediction engine, but its prediction is not to simply output the next token directly, but the probability of many tokens. By sampling, setting appropriate temperature (Temperature), Top-K and Top-P, the predicted token probability can be processed to select a single output token.
2.2.1 Temperature
Temperature controls the randomness of output tokens, providing a balance between predictability/factual accuracy and creativity/diversity.
• Higher temperature values mean that the expected output tends to be deterministic, such as some factual scenarios. • The lower the temperature value, the more creative the expected output is, such as some copywriting scenarios. • A temperature value of zero means that the expected output is completely deterministic (greedy decoding), but it still does not mean that you will always get the same output because there may be tokens with the same probability.
2.2.2 Top-K and Top-P
In addition to controlling the randomness of the large model output through temperature, you can also use Top-K and Top-P, which mainly restrict the predicted tokens to tokens with higher probability.
• Top-K means taking the K tokens with the highest probability from the distribution predicted by the model. • A higher value means the model’s output is more creative. • Lower values mean the model output is more deterministic • Its value is 1, which is equivalent to a temperature value of zero, i.e. greedy decoding. • Top-P means selecting the highest probability token whose cumulative probability does not exceed a certain value (P). • Its value ranges from 0 (greedy decoding) to 1 (all tokens in the LLM vocabulary)
2.3 Putting it all together
The choice between Top-K, Top-P, temperature, and the number of tokens to generate depends on the specific application and the desired results, and these settings affect each other.
• Want a particularly creative result? ? Start with Temperature 0.9, Top-P 0.99 and Top-K 40. • Want a less creative result? ? Start with Temperature 0.1, Top-P 0.9, and Top-K 20. • Does the task always have one correct answer (e.g., answering a math problem)? ? Start at temperature 0
?In general, the following settings give relatively consistent results that are creative but not excessive:
Temperature = 0.2
Top-P = 0.95
Top-K = 30
3. Tips and tricks
3.1 General prompting / zero shot
Zero-shot is the simplest hinting technique that requires no examples and the model temperature should be set to a low number as no creativity is required.
3.2 One-shot & few-shot
Single-shot & few-shot means giving one or more examples. The number of examples required depends on several factors, including the complexity of the task, the quality of the examples, and the capabilities of the generative AI (gen AI) model used.
3.3 System, contextual and role prompting
• System prompting sets the overall context and purpose of the language model. It defines the “big picture” of what the model should do, such as translating languages, classifying reviews, etc. ?Requires output in json format, limits the model output structure, and to some extent limits model hallucinations.
• Contextual prompting provides specific details or background information related to the current conversation or task. It helps the model understand the nuances of the questions asked and adjust the response accordingly. • Role prompting assigns a specific role or identity to the language model to adopt. This helps the model generate responses that are consistent with the assigned role and its associated knowledge and behavior
These three types may seem to overlap, for example, a prompt containing a role may still contain a description of the context, but they are still distinct and represent different levels or dimensions of guiding LLM. System prompts set the stage, context prompts provide immediate scene details, and role prompts define the image of the "actor". They can be used alone or in combination for fine-grained control
For example, a system prompt might define the task as "translate text" with a mandatory nature, a context prompt might provide the source text and target language, and a role prompt might specify "play the role of a professional translator specializing in legal documents."
3.4 Step-back prompting
The fallback prompt means that instead of describing the problem directly at the beginning, the LLM is first prompted to consider a more general and common question related to the current task, and then based on this answer, it is used as context to input into the subsequent prompt words for specific tasks.
3.5 Chain of Thought (CoT)
Thinking chain prompts refer to prompting the large model to provide model output capabilities by generating intermediate reasoning steps.
?Skill:
1.
Add " Let's think step by step " to the prompt ;2.
The ability of thought chain combined with single sample/multiple sample examples will be more powerful.
3.6 Self-consistency
Thinking chain prompts are mainly used to improve the accuracy of model output by generating intermediate reasoning steps. However, there is often not only one good reasoning path. By repeatedly using thinking chain prompts, the most common answer can be selected from multiple answer results to ensure the robustness of the model output.
3.7 Tree of Thoughts (ToT)
For complex tasks, the reasoning path may not be a simple linear path, but nonlinear. The mind tree prompts a shift from linear or independent reasoning paths to a more structured exploration strategy, allowing the model to consider alternatives at each step , possibly backtrack, and evaluate different branches , mimicking a more thoughtful human problem-solving approach. Similar to a tree, explore the best reasoning path through different branches.
For more details, please refer to the document "Large Language Model Guided Tree-of-Thought" (ToT)
3.8 ReAct (reason & act)
The large model has the ability to generate text, but diverse scenario applications require more than just text generation. By combining external tools (search, code interpreter, etc.), the ability of the large model can be better expanded. This is also the first step towards intelligent agents.
3.9 Automatic Prompt Engineering (APE)
That is, using AI to generate prompt words, write a prompt to generate prompt words. Its core includes evaluating candidate instructions through some indicators, such as BLEU (Bilingual Evaluation Understudy) or ROUGE (Recall-Oriented Key Point Evaluation Understudy).
3.10 Code prompting
• Tips for writing code: Tips for writing code for large models • Tips for explaining code: Let the big model help explain the code • Tips for translating code: Helps to translate code from one language to another • Tips for debugging and reviewing code: Helps fix code bugs and provides error messages and code as context to the larger model.
4. Best Practices
• Provide examples that demonstrate expected outputs or similar responses to the larger model. • Keep the design simple and ensure the prompts are concise, clear and easy to understand. • Output requirements should be specific. • Tell what to do, don’t tell what not to do . Large models generally prefer positive instructions. • Control length , e.g. “in 3 sentences”. • For non-creative tasks, such as extracting, selecting, parsing, sorting, ranking, or classifying data, have your output returned in a structured format, such as JSON or XML, to force the model to create structure and limit illusions.
Reference source: Google's official Prompt Engineering white paper