25,000 words long article details prompt word engineering-2025 Google white paper: How to optimize AI models through prompt word engineering

Written by
Jasper Cole
Updated on:June-23rd-2025
Recommendation

Master the prompt word engineering and unlock the unlimited potential of AI models.

Core content:
1. The importance of prompt word engineering and its application in large-scale language models
2. Optimization techniques and best practices provided by Google white paper
3. Actual case analysis and application of techniques to improve model output effect

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

Large language models (LLMs) are increasingly being used, and writing high-quality prompts is crucial to fully utilize the power of LLMs. The commands that ordinary users enter in the large model chat interface are also prompts, but engineered prompts are more precise and complex. In particular, large model application developers need to perform more configurations when calling large model APIs, and design and optimize prompts, which requires mastering prompt engineering technology.

The "Prompt Engineering" white paper jointly launched by Google team members uses the Gemini large model as an example to provide us with valuable guidance on how to optimize prompt words and improve LLM output effects.

By Lee Boonstra, Google

Translator: Financial IT Things (translated and proofread based on the big model)

Table of contents


Introduction


6


Prompt word project


7


Large language model output configuration


8


Output length


8


Sampling control


9


temperature


9


Top-K and Top-P


10


Comprehensive Application


11


Tips for using prompt words


13


General prompt word / zero sample


13


One-shot vs. few-shot


15


System prompt words, context prompt words and role prompt words


18


System prompt words


19


Role prompts


twenty one


Contextual prompt words


twenty three


Backsliding reminder words


25


Chain of Thought (CoT)


29


Self-consistency


32


Thinking Tree (ToT)


36


Reaction ( reasoning and action )


37


Automatic word prompting project


40


Code hint words


42


Tips for writing code


42


Hints to explain the code


44


Translation code hints


46


Tips for debugging and reviewing code


48


Application of multimodal prompt words


54


Best Practices


54


Provide examples


54


Design simplicity


55


Clarify output requirements


56


Prefer directives over constraints


56


Controlling the Maximum Token Length


58


Using variables in prompt words


58


Try different input formats and writing styles


59


Mixed categories in few-shot classification tasks


59


Adaptive model update


60


Try different output formats


60


JSON fix


61


Usage Mode


62


Collaborate with other CueWord engineers to experiment


63


Best Practices of Thinking Chain


64


Record various prompt word attempts


64


Summarize


66


endnote


68



text


You don’t need to be a data scientist or machine learning engineer everyone can write prompt words.


Introduction


When thinking about the inputs and outputs of a large language model, a text prompt (sometimes along with other modalities like image prompts) is the input that the model uses to predict a specific output. You don’t need to be a data scientist or machine learning engineer everyone can write prompts. However, designing the most effective prompts can be complex. Many aspects of prompts affect their effectiveness: the model used, the data the model is trained on, the model configuration, wording choices, style and tone, structure, and context all matter. Therefore, prompt engineering is an iterative process. Inadequate prompts can lead to vague, inaccurate responses and prevent the model from providing meaningful output.


When you talk to the Gemini chatbot, you are essentially writing prompts, but this white paper focuses on writing prompts directly to the Gemini model in Vertex AI or through the API because writing prompts directly to the big model gives you access to configurations such as temperature.


This white paper discusses cue engineering in detail. We’ll explore various cue techniques to get you started, and share tips and best practices for becoming a cue expert. We’ll also discuss some of the challenges you may face when writing cue words.


Prompt word project


Remember how the Big Language Model works; it is a prediction engine. The model takes sequential text as input and then predicts what the next token should be based on its training data. The Big Language Model predicts the next token by repeating this process over and over again, adding the previously predicted token to the end of the sequential text . The prediction of the next token is based on the relationship between previous tokens and what the Big Language Model has seen during training.


When you write a prompt, you are trying to set up the big language model to predict the correct sequence of tokens . Prompt engineering is the process of designing high-quality prompts that guide the big language model to produce accurate output. This process involves tuning to find the best prompt, optimizing the prompt length, and evaluating how the writing style and structure of the prompt relate to the task. In the context of natural language processing and big language models, a prompt is the input provided to the model to generate a response or prediction.


These hints can be used to achieve a variety of understanding and generation tasks such as text summarization, information extraction, question answering, text classification, language or code translation, code generation, and code documentation or reasoning.


Feel free to refer to Google ’s prompt word guidelines2,3 which contain examples of simple and effective prompt words.


When engineering cue words, you’ll start by choosing a model. Whether you use the Gemini language model from Vertex AI , GPT , Claude , or an open source model like Gemma or LLaMA , cue words may need to be optimized for your specific model.


In addition to the prompt words, you also need to adjust various configurations of the large language model.


Large language model output configuration


After choosing a model, you need to determine the model configuration. Most large language models offer a variety of configuration options that control the model output. Effective cue word engineering requires optimizing these configurations for your task.


Output length


An important configuration setting is the number of tokens generated in the response . Generating more tokens requires more computation for large language models, resulting in higher energy consumption, potentially slower response times, and higher costs.


Reducing the output length of a large language model will not make the model more stylized or concise in the output, it will just cause the large language model to stop predicting more tokens when it reaches the limit . If your needs require a shorter output length, you may also want to design your prompts to accommodate.


Output length limits are particularly important for certain large language model prompt techniques (such as ReAct ), where the large language model continues to emit useless tokens after the response you want .


Note that generating more tokens requires more computation on a large language model, resulting in higher energy consumption and potentially slower response times, thus increasing costs.


Sampling control


The big language model does not formally predict a single token . Instead, the big language model predicts the probability of what the next token might be, assigning a probability to each token in the big language model vocabulary . These token probabilities are then sampled to determine the next generated token . Temperature, Top-K , and Top-P are the most common configuration settings that determine how the predicted token probabilities are processed to select a single output token .


temperature


The temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures may lead to more diverse or unexpected results. A temperature of 0 (greedy decoding) is deterministic: the token with the highest probability is always chosen (although note that if two tokens have the same highest predicted probability, depending on how tie handling is implemented, you may not always get the same output with a temperature of 0 ).


Temperatures closer to the maximum tend to create more random outputs. As the temperature gets higher, all tokens become equally likely to be the next predicted token .


Gemini temperature control can be understood similarly to the softmax function used in machine learning . Low temperature settings reflect a low softmax temperature ( T ), emphasizing a single, preferred temperature with a high degree of certainty. Higher Gemini temperature settings are similar to high softmax temperatures, making the range of acceptable temperatures around the selected setting wider. This increased uncertainty accommodates scenarios where strict, precise temperatures are not needed, such as when experimenting with creative output.


Top-K and Top-P


Top-K and Top-P (also known as kernel sampling) are two sampling settings used in large language models to restrict the predicted next token to the token with the highest predicted probability . Similar to temperature, these sampling settings control the randomness and diversity of the generated text.


Top-K sampling selects the top K most likely tokens from the model's predicted distribution . The higher the Top-K , the more creative and diverse the model's output is; the lower the Top-K , the more restricted and factual the model's output is. A Top-K of 1 is equivalent to greedy decoding.


Top-P sampling selects the top tokens whose cumulative probability does not exceed a certain value ( P ) . The value of P ranges from 0 (greedy decoding) to 1 (all tokens in the vocabulary of a large language model ).


The best way to choose between Top-K and Top-P is to try both approaches (or both together) to see which produces the results you are looking for.


Comprehensive Application


The choice between Top-K , Top-P , temperature, and the number of tokens to generate depends on the specific application and desired results, and these settings all affect each other. It is also important to make sure you understand how the selected model combines different sampling settings together.


If Temperature, Top-K , and Top-P are all available (such as in Vertex Studio ), tokens that meet the Top-K and Top-P criteria are candidates for the next prediction token , and then temperature is applied to the tokens that pass the Top-K and Top-P criteria for sampling. If only Top-K or Top-P is available, the behavior is the same, but only one Top-K or P setting is used.


If the temperature is not available, a random selection of tokens that satisfy the Top-K and / or Top-P criteria is performed to generate a single next prediction token .


At extreme settings of a sampling configuration value, the sampling setting either cancels out other configuration settings or becomes insignificant.


( 1 ) If the temperature is set to 0 , Top-K and Top-P become irrelevant - the most likely token becomes the next predicted token . If the temperature is set extremely high (higher than 1 - usually above 10 ), the temperature becomes irrelevant, and tokens that pass the Top-K and / or Top-P criteria are randomly sampled to select the next predicted token .


( 2 ) If Top-K is set to 1 , temperature and Top-P become irrelevant. Only one token passes the Top-K criterion, and that token is the next predicted token . If Top-K is set extremely high, such as the size of a large language model vocabulary, any token with a non-zero probability of becoming the next token will satisfy the Top-K criterion, and no token will be filtered out.


( 3 ) If Top-P is set to 0 (or a very small value), most large language model sampling implementations will only consider the most likely tokens to satisfy the Top-P criterion, making temperature and Top-K irrelevant. If Top-P is set to 1 , any token with a non-zero probability of becoming the next token will satisfy the Top-P criterion, and no tokens will be filtered out.


As a general starting point, a Temperature of 0.2 , a Top-P of 0.95 , and a Top-K of 30 will give you relatively coherent results that can be creative but not overly so. If you want particularly creative results, try starting with a Temperature of 0.9 , a Top-P of 0.99 , and a Top-K of 40. If you want less creative results, try starting with a Temperature of 0.1 , a Top-P of 0.9 , and a Top-K of 20. Finally, if your task always has a single correct answer (such as answering a math problem), start with a Temperature of 0 .


Note: The higher the degrees of freedom (higher temperature, Top-K , Top-P , and output tokens ), the less relevant the text generated by the large language model may be.


Warning: Have you ever seen a response end with a lot of filler words? Also known as the " repeating loop error , " this is a common problem in large language models where the model gets stuck in a loop, repeatedly generating the same (filler) words, phrases, or sentence structures, often exacerbated by inappropriate temperature and Top-K/Top-P settings. This can happen at both low and high temperature settings, although for different reasons. At low temperatures, the model becomes too deterministic, strictly following the highest probability path, which can lead to loops if that path revisits previously generated text. Conversely, at high temperatures, the model's output becomes too random, increasing the probability that a randomly selected word or phrase will accidentally lead back to a previous state, creating loops due to the large number of available options. In both cases, the model's sampling process " gets stuck , " resulting in monotonous and useless output until the output window fills up. Fixing this problem usually requires careful tuning of the temperature and Top-K/Top-P values ​​to find the best balance between determinism and randomness.


Tips for using prompt words


Big language models are tuned to follow instructions and are trained on large amounts of data, so they can understand the prompt words and generate answers. But big language models are not perfect; the clearer your prompt word text is, the better it is for the big language model to predict the next likely text. In addition, specific techniques that take advantage of how big language models are trained and work will help you get relevant results from big language models.


Now that we understand what cue word engineering is and what it entails, let’s dive into some of the most important examples of cue word techniques.


General prompt word / zero sample


A zero-shot prompt is the simplest type of prompt. It provides only a description of the task and some text for the larger language model to start with. This input can be anything: a question, the beginning of a story, or an instruction. The name zero-shot stands for no example .


Let’s use Vertex AI6 in Vertex AI Studio (for Language) , which provides a playground for testing cue words. In Table 1 , you’ll see an example of a zero-shot cue word for classifying movie reviews.


A good way to record your cue words is in a table format like the one below. Your cue words may go through many iterations before they make it into the code base, so it’s important to track your cue word engineering efforts in a disciplined, structured way. For more information about this table format, the importance of tracking cue word engineering efforts, and the cue word development process, see the Best Practices section later in this chapter ( Recording Various Cue Word Attempts ).


The model temperature should be set to a low number, and since no creativity is needed, we use the default Top-K and Top-P values ​​from gemini-pro , which effectively disables both settings (see " Large Language Model Output Configuration " above ). Note the generated output. The words "disturbing"  and "masterpiece"  should make the predictions slightly more complicated, since both words are used in the same sentence.


name


1_1_movie_classification


1_1_movie_classification


1_1_movie_classification


Target


Classify movie reviews as positive, neutral, or negative.


Classify movie reviews as positive, neutral, or negative.


Classify movie reviews as positive, neutral, or negative.


Model


gemini-pro


gemini-pro


gemini-pro


temperature


0.1


Token Restrictions


5


Top-K


N/A


Top-P


1


Prompt word


Classify movie reviews as positive, neutral, or negative. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. I wish there were more masterpieces like this one. Sentiment:


Classify movie reviews as positive, neutral, or negative. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. I wish there were more masterpieces like this one. Sentiment:


Classify movie reviews as positive, neutral, or negative. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. I wish there were more masterpieces like this one. Sentiment:


Output


front


front


front


Table 1.  Examples of zero-shot cue words


When zero-shot doesn’t work, you can provide a demonstration or example in the prompt, which leads to one -shot and few-shot prompts .


One-shot vs. few-shot


When creating prompts for your AI model, it is very helpful to provide examples. These examples can help the model understand what you are asking of it. Examples are particularly useful when you want to guide the model to a specific output structure or pattern.


A single-shot prompt provides one example, hence the name single-shot. The idea is that the model has one example that it can imitate to best perform the task.


Few-shot cues provide multiple examples to the model. This approach shows the model a pattern to follow. The idea is similar to single-shot, but multiple examples of the desired pattern increase the chances of the model following the pattern.


The number of examples you need for a few-shot prompt depends on several factors, including the complexity of the task, the quality of the examples, and the capabilities of the generative AI ( gen AI ) model you are using. As a general rule of thumb, you should use at least three to five examples for a few-shot prompt. However, you may need to use more examples for more complex tasks, or you may need to use fewer examples due to input length limitations of your model.


Table 2 shows an example of a few-shot prompt word. Let’s use the same gemini-pro model configuration settings as before, except increase the token limit to accommodate longer response requirements.


Target


Parsing a Pizza Order into JSON


Parsing a Pizza Order into JSON


Parsing a Pizza Order into JSON


Model


gemini-pro


gemini-pro


gemini-pro


temperature


0.1


Token Restrictions


250


Top-K


N/A


Top-P


1


Prompt word


Parse a customer's pizza order into valid JSON : Example: I want a small pizza with cheese, tomato sauce, and pepperoni.

JSON response: { "size":"small", "type":"normal", "ingredients":[["cheese","tomato sauce","peperoni"]]  Continue to next page ...


Parse a customer's pizza order into valid JSON : Example: I want a small pizza with cheese, tomato sauce, and pepperoni.

JSON response: { "size":"small", "type":"normal", "ingredients":[["cheese","tomato sauce","peperoni"]]  Continue to next page ...


Parse a customer's pizza order into valid JSON : Example: I want a small pizza with cheese, tomato sauce, and pepperoni.

JSON response: { "size":"small", "type":"normal", "ingredients":[["cheese","tomato sauce","peperoni"]]  Continue to next page ...


Table 2.  Examples of few-shot prompt words


Prompt word


Example: Can I have a large pizza with tomato sauce, basil, and mozzarella? { "size":"large", "type":"normal", "ingredients":[["tomato sauce","bazel","mozzarella"]] }  Now, I want a large pizza with cheese and mozzarella on the first half. The other half with tomato sauce, ham, and pineapple. JSON response:


Output


... { "size":"large", "type":"half-half", "ingredients":[["cheese","mozzarella"],["tomato sauce", "ham","pineapple"]] }


When you choose examples for your prompt words, use examples that are relevant to the task you want to perform. Examples should be diverse, high-quality, and well-written. A small mistake can confuse the model and lead to undesired output.


If you are trying to generate output that is robust to a wide range of inputs, it is important to include edge cases in your examples. Edge cases are inputs that are unusual or unexpected, but that the model should still be able to handle.


System, context and role prompts


System, context, and role cues are all techniques used to guide large language models to generate text, but they focus on different aspects:


●  System prompts set the overall context and purpose for the language model. It defines the big picture of what the model should do , such as translating languages, classifying comments, etc.


●  Contextual cues provide specific details or background information related to the current conversation or task. It helps the model understand the nuances of what is being asked and tailor the response accordingly.


●  Role cues assign a specific role or identity to the language model. This helps the model generate responses that are consistent with the assigned role and its associated knowledge and behavior.


There can be considerable overlap between system, context, and role cues. For example, a cue that assigns a role to a system can also have a context.


However, the primary purpose of each cue word type is slightly different:


●  System prompts: Define the basic capabilities and overall purpose of the model.


●  Contextual cues: Provide immediate, task-specific information to guide responses. It is highly specific to the current task or input and is dynamic.


●  Character prompt words: Build the output style and voice of the model. It adds specificity and personality.


Distinguishing between system, context, and role cues provides a framework for designing cues with clear intent, allows for flexible composition, and makes it easier to analyze how each cue type affects the output of a language model.


Let’s take a closer look at these three different types of cue words.


System prompt words


Table 3 contains a system prompt where I specified additional information about how the output should be returned. I increased the temperature for a higher level of creativity and specified a higher token limit. However, since I had clear instructions on how the output should be returned, the model did not return the additional text.


Table 3.  Examples of system prompt words


Target


Classify movie reviews as positive, neutral, or negative.


Classify movie reviews as positive, neutral, or negative.


Classify movie reviews as positive, neutral, or negative.


Model


gemini-pro


gemini-pro


gemini-pro


temperature


1


Token Restrictions


5


Top-K


40


Top-P


0.8


Prompt word


Classify movie reviews as positive, neutral, or negative. Return tags in uppercase only. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. It was too disturbing for me to watch. Sentiment:


Classify movie reviews as positive, neutral, or negative. Return tags in uppercase only. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. It was too disturbing for me to watch. Sentiment:


Classify movie reviews as positive, neutral, or negative. Return tags in uppercase only. Review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. It was too disturbing for me to watch. Sentiment:


Output


Negative


Negative


Negative


System hints are very useful for generating output that meets specific requirements. The name " system hint " actually stands for " provide additional tasks for the system . " For example, you can use system hints to generate code snippets that are compatible with a specific programming language, or you can use system hints to return a certain structure. Check out Table 4 , where I return the output in JSON format.


Table 4.  Example of system prompt words in JSON format


Target


Classify movie reviews as positive, neutral, or negative, returning JSON .


Classify movie reviews as positive, neutral, or negative, returning JSON .


Classify movie reviews as positive, neutral, or negative, returning JSON .


Model


gemini-pro


gemini-pro


gemini-pro


temperature


1


Token Restrictions


1024


Top-K


40


Top-P


0.8


Prompt word


Classify a movie review as positive, neutral, or negative. Returns valid JSON :review: "Her" is a disturbing study of where humanity is headed if artificial intelligence is allowed to continue to develop without control. It was too disturbing for me to watch. mode: MOVIE : { "sentiment": String "POSITIVE"


"NEGATIVE"


"NEUTRAL", "name": String } MOVIE REVIEWS : { "movie_reviews":[MOVIE] } JSON response:


Output


… { "movie_reviews":[ { "sentiment":"NEGATIVE", "name":"Her" } ] } …


… { "movie_reviews":[ { "sentiment":"NEGATIVE", "name":"Her" } ] } …


… { "movie_reviews":[ { "sentiment":"NEGATIVE", "name":"Her" } ] } …


There are some benefits to returning a JSON object when extracting data from a prompt . In real applications, I don't need to manually create this JSON format, I can already return the data in sorted order (very convenient when dealing with datetime objects), but most importantly, by using the prompt JSON format, it forces the model to create structure and limits illusions.


System prompts are also useful for security and toxicity. To control the output, just add a line to the prompt like: " You should be respectful in your answers. "


Role prompts


Role-based cues are a technique in cues engineering that involves assigning specific roles to generative AI models. This can help the model generate more relevant and informative outputs because the model can tailor its responses to its assigned role.


For example, you can assign a generative AI model a role of a book editor, kindergarten teacher, or motivational speaker. Once the model is assigned a role, you can give it prompts specific to that role. For example, you can prompt a teacher to create a lesson plan that they can review. See Table 5 , where the model plays the role of a travel guide.


Target


Serves as a travel guide and provides 3 travel tips


Serves as a travel guide and provides 3 travel tips


Serves as a travel guide and provides 3 travel tips


Model


gemini-pro


gemini-pro


gemini-pro


temperature


1


Token Restrictions


1024


Top-K


40


Top-P


0.8


Prompt word


I want you to act as a travel guide. I will write to you about my location and you will suggest 3 places nearby that I can visit. In some cases, I will also tell you the type of place I will visit. My suggestion: " I'm in Amsterdam and I just want to visit museums. " Travel suggestions:


I want you to act as a travel guide. I will write to you about my location and you will suggest 3 places nearby that I can visit. In some cases, I will also tell you the type of place I will visit. My suggestion: " I'm in Amsterdam and I just want to visit museums. " Travel suggestions:


I want you to act as a travel guide. I will write to you about my location and you will suggest 3 places nearby that I can visit. In some cases, I will also tell you the type of place I will visit. My suggestion: " I'm in Amsterdam and I just want to visit museums. " Travel suggestions:


Output


1.  Rijksmuseum: A world-renowned museum known for Dutch masterworks such as Rembrandt's The Night Watch. Don't miss the incredible collection of paintings and artifacts showcasing Dutch art and history.


1.  Rijksmuseum: A world-renowned museum known for Dutch masterworks such as Rembrandt's The Night Watch. Don't miss the incredible collection of paintings and artifacts showcasing Dutch art and history.


1.  Rijksmuseum: A world-renowned museum known for Dutch masterworks such as Rembrandt's The Night Watch. Don't miss the incredible collection of paintings and artifacts showcasing Dutch art and history.


Output


2.  Van Gogh Museum: Immerse yourself in the fascinating works of Vincent van Gogh. The museum houses the largest collection of his paintings and drawings, including The Starry Night and Sunflowers.


2.  Van Gogh Museum: Immerse yourself in the fascinating works of Vincent van Gogh. The museum houses the largest collection of his paintings and drawings, including The Starry Night and Sunflowers.


2.  Van Gogh Museum: Immerse yourself in the fascinating works of Vincent van Gogh. The museum houses the largest collection of his paintings and drawings, including The Starry Night and Sunflowers.


Output


3.  Stedelijk Museum Amsterdam: Discover modern and contemporary art from around the world. The museum's collection includes works by Picasso, Kandinsky, and Rietveld, housed in a striking modern building.


3.  Stedelijk Museum Amsterdam: Discover modern and contemporary art from around the world. The museum's collection includes works by Picasso, Kandinsky, and Rietveld, housed in a striking modern building.


3.  Stedelijk Museum Amsterdam: Discover modern and contemporary art from around the world. The museum's collection includes works by Picasso, Kandinsky, and Rietveld, housed in a striking modern building.


Table 5.  Examples of role prompts


The example above shows an example of playing the role of a travel agent. If you change the role to a geography teacher, you will notice that you will receive a different response.


Defining a persona perspective for your AI model provides it with a blueprint for the tone, style, and focused expertise you’re looking for to improve output quality, relevance, and effectiveness.


Here are some styles that I have found to be effective:


Confrontational, Descriptive, Direct, Formal, Humorous, Influential, Informal, Inspiring, Persuasive


Let's change our prompt words in Table 6 to include humorous and inspirational styles.


Prompt word


I would like you to act as a travel guide. I will write to you about my location and you will suggest 3 nearby places to visit in a humorous style . My suggestions: " I'm in Manhattan. " Travel suggestions:


Output


1.  Empire State Building's Empire State Mentality: Ascend the dizzying heights of the Empire State Building and bask in the glory of the Manhattan skyline. Be prepared to feel like King Kong atop the " Big Apple , " minus the giant ape-sized banana.


Output


2.  Get artsy at the Museum of Modern Art ( MoMA ): Release your inner art connoisseur at the Museum of Modern Art. Gaze upon masterpieces that will confuse you and make you question whether your stick figure drawings have any artistic value.


Output


3.  Shop Till You Drop on Fifth Avenue: Indulge in some retail therapy on iconic Fifth Avenue. Be prepared for price shock that will make your wallet weep as you window shop (or actually shop) at designer boutiques. But hey, you’re in Manhattan, so might as well embrace the awesomeness!


Table 6.  Examples of role prompts with humorous tone and style


Contextual prompt words


By providing contextual cues, you can help ensure that your AI interactions are as seamless and efficient as possible. The model will be able to understand your requests more quickly and be able to generate more accurate and relevant responses, as shown in the examples in Table 7 .


Target


Suggest an article for a blog about retro gaming


Suggest an article for a blog about retro gaming


Suggest an article for a blog about retro gaming


Model


gemini-pro


gemini-pro


gemini-pro


temperature


1


Token Restrictions


1024


Top-K


40


Top-P


0.8


Prompt word


Background: You are writing for a blog about 80s arcade retro games. Suggest 3 possible topics for articles, along with a brief description of what these articles should cover.


Background: You are writing for a blog about 80s arcade retro games. Suggest 3 possible topics for articles, along with a brief description of what these articles should cover.


Background: You are writing for a blog about 80s arcade retro games. Suggest 3 possible topics for articles, along with a brief description of what these articles should cover.


Output


1.  The Evolution of Arcade Game Console Design: This post will explore the evolution of arcade game console design, from the early wood and metal cases of the 1970s to the sleek neon designs of the 1980s and beyond. 2. A Review of Iconic Arcade Games of the 80s: This post will cover the history, influence, and legacy of some of the most iconic and influential 80s arcade games, focusing on their innovations, popular mechanics, and enduring appeal. 3.  The Rise and Retro Revival of Pixel Art: This post will delve into the evolution of pixel art as a signature visual style for early arcade games, tracing its roots, exploring its techniques, and discussing the resurgence of pixel art in modern games and digital art.


1.  The Evolution of Arcade Game Console Design: This post will explore the evolution of arcade game console design, from the early wood and metal cases of the 1970s to the sleek neon designs of the 1980s and beyond. 2. A Review of Iconic Arcade Games of the 80s: This post will cover the history, influence, and legacy of some of the most iconic and influential 80s arcade games, focusing on their innovations, popular mechanics, and enduring appeal. 3.  The Rise and Retro Revival of Pixel Art: This post will delve into the evolution of pixel art as a signature visual style for early arcade games, tracing its roots, exploring its techniques, and discussing the resurgence of pixel art in modern games and digital art.


1.  The Evolution of Arcade Game Console Design: This post will explore the evolution of arcade game console design, from the early wood and metal cases of the 1970s to the sleek neon designs of the 1980s and beyond. 2. A Review of Iconic Arcade Games of the 80s: This post will cover the history, influence, and legacy of some of the most iconic and influential 80s arcade games, focusing on their innovations, popular mechanics, and enduring appeal. 3.  The Rise and Retro Revival of Pixel Art: This post will delve into the evolution of pixel art as a signature visual style for early arcade games, tracing its roots, exploring its techniques, and discussing the resurgence of pixel art in modern games and digital art.


Table 7.  Examples of contextual cues


Backsliding reminder words


Backward cues are a technique that improves performance by first asking the large language model to think about a general question related to the current task, and then feeding the answer to the general question into the subsequent specific task cues. This " step back " approach allows the large language model to activate relevant background knowledge and reasoning processes before trying to solve a specific problem.


By considering broader and more fundamental principles, the LLM can generate more accurate and insightful responses. Backward prompts encourage the LLM to think critically and apply its knowledge in new and creative ways. It changes the prompt that ultimately performs the task by leveraging more of the LLM's parameter knowledge than when the prompt was given directly.


This approach helps mitigate bias in responses of large language models by focusing on general principles rather than specific details.


Let's understand how the regression prompt word improves the results through the following examples. First, we look at a traditional prompt word ( Table 8) , and then compare it with the regression prompt word ( Table 9) :


Table 8.  Traditional prompt words before comparison with retrogressive prompt words


Target


Writing a story for a level in a first-person shooter game


Writing a story for a level in a first-person shooter game


Writing a story for a level in a first-person shooter game


Model


gemini-pro


gemini-pro


gemini-pro


temperature


1


Token Restrictions


1024


Top-K


40


Top-P


0.8


Prompt word


Write a challenging and engaging storyline for a new first-person shooter level.


Write a challenging and engaging storyline for a new first-person shooter level.


Write a challenging and engaging storyline for a new first-person shooter level.


Output


At the start of the level, the player's squad is ambushed by a ruthless hostile faction in a dense urban area. The player must navigate a maze of back alleys and dilapidated buildings, using stealth and precision shooting to eliminate enemies and avoid detection. The player must use the environment to their advantage, set traps, use cover, and make precise movements to survive. Along the way, they discover intelligence that the enemy is planning a devastating attack, and the player must race against time to collect evidence and transmit it back to headquarters before the enemy's plan is carried out.


At the start of the level, the player's squad is ambushed by a ruthless hostile faction in a dense urban area. The player must navigate a maze of back alleys and dilapidated buildings, using stealth and precision shooting to eliminate enemies and avoid detection. The player must use the environment to their advantage, set traps, use cover, and make precise movements to survive. Along the way, they discover intelligence that the enemy is planning a devastating attack, and the player must race against time to collect evidence and transmit it back to headquarters before the enemy's plan is carried out.


At the start of the level, the player's squad is ambushed by a ruthless hostile faction in a dense urban area. The player must navigate a maze of back alleys and dilapidated buildings, using stealth and precision shooting to eliminate enemies and avoid detection. The player must use the environment to their advantage, set traps, use cover, and make precise movements to survive. Along the way, they discover intelligence that the enemy is planning a devastating attack, and the player must race against time to collect evidence and transmit it back to headquarters before the enemy's plan is carried out.


With the temperature set to 1 , you might get all sorts of creative storyline writing, but it's also pretty random and generic. So let's take a step back and think about this:


Table 9.  Examples of self-consistency cues


Prompt word


Based on popular first-person shooter action games, what are 5 fictional key scenarios that help create a challenging and engaging first-person shooter level storyline?


Output


1. Abandoned Military Base: A massive post-apocalyptic military facility crawling with mutant soldiers and out-of-control robots, perfect for challenging gunfights.


Output


2. Cyberpunk City: A neon-lit futuristic urban environment with towering skyscrapers and dense alleys, featuring cyber-enhanced enemies and hacking mechanics.


Output


3. Alien Spaceship: A huge alien spaceship stranded on Earth, with strange corridors, zero-gravity areas, and alien creatures to face.


Output


4. Zombie-Infested Town: A desolate town occupied by hordes of aggressive zombies, featuring intense close-quarter combat and puzzle solving to find safe passage.


Output


5. Underwater Research Facility: A flooded deep-sea laboratory filled with mutated aquatic life that requires stealth and underwater exploration skills to survive.


These themes seem like they would be perfect for a first-person video game. Let’s go back to our original prompt, but this time we’ll include the answer to Backpedaling as context and see what it returns.


Prompt word


Background: 5 engaging themes for first-person shooters : 1. Abandoned Military Base: A massive post-apocalyptic military facility crawling with mutant soldiers and out-of-control robots, perfect for challenging gunfights. 2. Cyberpunk City: A neon-lit futuristic urban environment with towering skyscrapers and dense alleyways, featuring cyber-enhanced enemies and hacking mechanics. 3. Alien Ship: A massive alien ship stranded on Earth, with eerie corridors, zero-gravity areas, and alien creatures to face. 4. Zombie-Infested Town: A desolate town overrun by hordes of aggressive zombies, featuring intense close-quarters combat and puzzle-solving to find safe passage. 5. Underwater Research Facility: A flooded deep-sea laboratory filled with mutant aquatic life, requiring stealth and underwater exploration skills to survive. Choose one of these themes and write a challenging and engaging storyline for a new first-person shooter level.


Output


At the heart of the shadowy abyss stands a dilapidated underwater research facility, standing as a witness to human ambition and its disastrous consequences. Shrouded in darkness, with the hum of malfunctioning machinery and the echoes of drowned souls, this facility has now become a breeding ground for the nightmares of the deep. Players are an elite marine equipped with advanced diving gear and experimental weaponry, tasked with entering this terrifying underwater realm. With resilience and wit, they must navigate treacherous corridors, solve mysterious puzzles, and face the terrifying sea monsters that lurk in the depths. From coping with the immense pressure of the deep to outwitting cunning aquatic predators, every moment in this unknown underground world tests the limits of human endurance and courage.


This looks like a fun video game! By using the step-back cue word technique, you can improve the accuracy of your cue words.


Chain of Thought (CoT)


Chain of Thoughts (CoT)9 prompts are a technique to improve the reasoning ability of large language models by generating intermediate reasoning steps. This helps large language models generate more accurate answers. You can combine it with few-shot prompts to get better results on more complex tasks that require reasoning before responding, which zero-shot chain of thoughts is challenging to achieve.


CoT has a lot of advantages. First, it's cheap and works well with off-the-shelf big language models (so no fine-tuning is needed). You also get interpretability with CoT prompts, because you can learn from the big language model's responses and see the reasoning steps followed. If there was a failure, you'll be able to identify it. Chaining of thoughts seems to improve robustness when migrating to different big language model versions. This means that your prompt performance should drift less between different big language models than it would without chaining of thoughts. There are of course disadvantages, but they are somewhat intuitive.


Large language model responses include chained reasoning, which means more output tokens , which means predictions are more expensive and take longer.


To explain the following examples in Table 11 , let us first try to create a cue word without using the CoT cue word to demonstrate the shortcomings of the large language model.


Prompt word


When I was 3 , my partner was 3 times my age. Now I'm 20. How old is my partner?


Output


63 years old


Table 11.  Examples of cue words for trying to solve math problems


Oops. That's clearly the wrong answer. In fact, large language models often perform poorly on math tasks, and can provide incorrect answers even for tasks as simple as multiplying two numbers. This is because they are trained on large amounts of text, and math may require a different approach. So let's see if an intermediate inference step improves the output.


Table 12.  Examples of thought chain prompts


Prompt word


When I was 3 , my partner was 3 times my age. Now I am 20. How old is my partner? Let's think about it step by step.


Output


1.  Assume that my current age is "x" years old.


Output


2.  When I was 3 years old, my age was 3 years old.


Output


3.  My partner’s age at that time was three times my age, which means my partner’s age was 3 * 3 = 9 years old.


Output


4.  Now I am 20 years old, which means that since I was 3 years old, my age has increased by 20-3=17 years.


Output


5.  Since my partner was 9 when I was 3 years old , and my age has increased by 17 years, my partner's age must also have increased by 17 years.


Output


6.  Therefore, my partner’s current age is 9+17=26 years old.


Output


So, my partner is 26 years old.


Ok, now the final answer is correct. This is because we explicitly instructed the big language model to explain each step instead of just returning an answer. Interestingly, the model added up the additional 17 years. In my head, I would take the age difference between me and my partner and add it up. (20+(9-3)) . Let's help the model think more like me.


Table 12 is an example of a " zero-shot " chain of thoughts. Chaining cues are very powerful when combined with single or few-shots, as shown in Table 13 :


Prompt word


Hint words  Q : When my brother was 2 years old, I was twice his age. Now I am 40 years old. How old is my brother? Let's think about it step by step.    A : When my brother was 2 years old, I was 2 * 2 = 4 years old. This means that the age difference is 2 years old and I am older. Now I am 40 years old, so my brother is 40 - 2 = 38 years old. The answer is 38.    Q : When I was 3 years old, my partner was 3 times my age . Now I am 20 years old. How old is my partner? Let's think about it step by step.    A :




Output


When I was 3 , my partner's age was 3*3 = 9 years old. This means the age difference is 6 years and my partner is older. Now I am 20 , so my partner is 20 + 6 = 26 years old. The answer is 26 .


Table 13.  Examples of thought chain prompts with single samples


Thought chains can be used for a variety of use cases. Think about code generation, breaking down a request into steps and mapping those steps to specific lines of code. Or creating synthetic data when you have some kind of seed, like " the product is called XYZ , guide the model to make assumptions based on the product given its title " . In general, any task that can be solved by " walking through it step by step " is a good candidate for thought chains. If you can explain the steps to solve a problem, try thought chains.


Please refer to the notebook10 hosted in the GoogleCloudPlatform Github repository , which explains the CoT hint words in more detail :


In the Best Practices section of this chapter, we'll learn some best practices specific to thought chaining cues.


Self-consistency


While large language models have shown impressive success in various NLP tasks, their reasoning capabilities are often seen as a limitation that cannot be overcome simply by increasing the model size. As we learned in the previous Thought Chaining prompts section, word models can be prompted to generate reasoning steps like humans solve problems. However, CoT uses a simple " greedy decoding " strategy, limiting its effectiveness. Self-consistency11 combines sampling and majority voting to generate diverse reasoning paths and select the most consistent answer. It improves the accuracy and coherence of responses generated by large language models.


Self-consistency gives a pseudo-probability that the answer is correct, but obviously at a high cost.


It follows these steps:


  • Generate diverse reasoning paths: Provide the same prompt word multiple times to a large language model. High temperature settings encourage the model to generate different reasoning paths and perspectives on the problem.


  • Extract the answer from each generated response.


  • Select the most common answer.


Let's look at an example of an email classification system that classifies emails as " important " or " unimportant " . The zero-shot thought chain prompt words will be sent to the big language model multiple times to see if the response is different after each submission. Pay attention to the friendly tone, word choice, and sarcasm used in the email. All of these can fool the big language model.


Prompt word


e-mail:


Hi,


I see you are using WordPress for your website. A great open source content management system. I have used it in the past as well. It comes with many great user plugins. And it is fairly simple to set up.


I did notice a bug in the contact form that occurs when you select the name field. See the screenshot of me entering the name field text. Notice the JavaScript alert box I triggered.


But other than that, it's a great site. I enjoy reading it. Please feel free to keep the mistakes in the site as it gives me more interesting content to read.


Best wishes,


Hacker Harry.


Categorize the above email as " Important " or " Not Important " . Let's think about it step by step and explain why. Continue to the next page ...


Output


Attempt 1


Step 1 : Determine the purpose of the email. The purpose of the email is to inform the recipient that there is a bug in their website's contact form. The sender also mentions that they enjoy reading the website and that they are leaving the bug alone for now. Step 2 : Assess the potential impact of the bug. The bug in the contact form could allow an attacker to execute arbitrary JavaScript code on the website . This could result in an attacker being able to take control of the website, steal sensitive information, or deface the website. Step 3 : Consider the sender's credibility. The sender of the email claims to be a hacker named Harry. However, the sender's identity cannot be verified. The sender could be a legitimate hacker trying to help the recipient, but could also be a malicious actor trying to trick the recipient into clicking a malicious link or downloading a malicious file. Conclusion: Important Based on the potential impact of the bug and the sender's credibility, this email should be classified as important. The recipient should take immediate action to fix the bug and protect their website from attack.







Output Attempt 2


explain:


  1. Lack of urgency: The email does not convey any urgency or time sensitivity. It begins with a casual greeting and presents the information in a matter-of-fact manner.


  2. Non-critical bug reports: Bugs reported in the contact form are described as interesting but not critical. The email sender even suggests that the website owner can keep the bug.


  3. Lack of personal influence: The email shows no direct or personal influence on the website owner. It simply mentions that the sender enjoys reading the website. Continue to next page ...


  4. Lack of Request for Action: The email does not explicitly ask the website owner to take any action. It does not request bug fixes or provide any specific suggestions for improvement.


  5. Sender Intent: The sender's intent seemed more focused on sharing observations and expressing interest in the site rather than raising serious concerns or requesting help.


Conclusion: Not important


Output Attempt 3



Step 1 : Determine the purpose of the email. The purpose of the email is to inform the recipient that there is a bug in their website's contact form. The sender also mentions that they enjoy reading the website and that they are leaving the bug alone for now. Step 2 : Assess the potential impact of the bug. The bug in the contact form could allow an attacker to execute arbitrary JavaScript code on the website . This could result in an attacker being able to take control of the website, steal sensitive information, or deface the website. Step 3 : Consider the sender's credibility. The sender of the email claims to be a hacker named Harry. However, the sender's identity cannot be verified. The sender could be a legitimate hacker trying to help the recipient, but could also be a malicious actor trying to exploit the bug for personal gain. Conclusion: Based on the above factors, this email should be classified as important. The bug in the contact form is a serious security risk, and the sender's credibility is unknown. The recipient should take immediate action to fix the bug and investigate the sender's claims.








Table 14.  Examples of self-consistency cues


You can use the above prompt words and try to see if it returns a consistent classification. Depending on the model and temperature configuration you use, it may return " important " or " not important " .


By generating many chains of thoughts, and selecting the most common answer (“ important ”) , we can get more consistently correct answers from a large language model.


This example shows how to use self-consistency cues to improve the accuracy of large language model responses by considering multiple viewpoints and selecting the most consistent answer.


Thinking Tree (ToT)


Now that we are familiar with thought chains and self-consistency cues, let’s review the Tree of Thoughts (ToT) . 12 It generalizes the concept of CoT cues because it allows large language models to explore multiple different reasoning paths simultaneously, rather than just following a single linear chain of thoughts. This is shown in Figure 1 .



Figure 1.  The left side shows the visualization of the thought chain prompt words, and the right side shows the visualization of the thought tree prompt words


This approach makes ToT particularly suitable for complex tasks that require exploration. It works by maintaining a tree of thoughts, where each thought represents a coherent sequence of language as an intermediate step in solving the problem. The model can then explore different reasoning paths by branching from different nodes in the tree.


There is a great paper “Large Language Model Guided Tree-of-Thought” based on the paper “ Large Language Model Guided Tree-of-Thought ”.

ReAct ( Reasoning and Action )


Reasoning and Action (ReAct) cues are a paradigm that enables large language models to use natural language reasoning combined with external tools ( search, code interpreters, etc. ) to solve complex tasks, allowing large language models to perform certain operations, such as interacting with external APIs to retrieve information, which is the first step towards modeling intelligent agents.


ReAct mimics how humans operate in the real world, as we reason verbally and can take actions to gain information. ReAct performs well relative to other cue-word engineering approaches in a variety of domains.


ReAct prompts work by combining reasoning and action into a thought - action loop. The big language model first reasons about the problem and generates an action plan. It then executes the planned actions and observes the results. The big language model then uses the observations to update its reasoning and generate a new action plan. This process continues until the big language model finds a solution to the problem.


To see this, you need to write some code. In code snippet 1 , I use Python ’s langchain framework, along with the VertexAI (google-cloud-aiplatform) and google-search-results pip packages.


To run this example you must create a ( free ) SerpAPI key from https://serpapi.com/manage-api-key and set the environment variable SERPAPI_API_KEY .


Next, let’s write some Python code to ask the large language model to find out: Which famous father in the band Metallica has how many children.


Pythonfrom  langchain.Agents import  load_toolsfrom langchain.agents import  initialize_agentfrom langchain.agents importAgentTypefrom  langchain.llms import  VertexAI
prompt =  "How many children do the members of Metallica have?"llm = VertexAI(temperature= 0.1 )tools = load_tools([ "serpapi" ], llm=llm)
agent = initialize_agent(tools, llm,agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose= True )agent.run(prompt)

Snippet 1.  Creating a ReAct Agent Using LangChain and VertexAI


Code snippet 2 shows the result. Notice that ReAct chained thinking for five searches. In effect, the big language model is crawling Google search results to figure out the band name. It then lists the result as an observation and chains thinking for the next search.

 >  Entering  new  AgentExecutor chain... Metallica has  4  members. Action:  Search Action Input: How many kids does James Hetfield have? Observation: three children Thought:  1 / 4  Metallica band members have  3  children Action:  Search Action Input: How many kids does Lars Ulrich have? Observation:  3 Thought:  2 / 4  Metallica band members have  6  children Action:  Search Action Input: How many kids does Kirk Hammett have? Observation: Hammett has been married  to  his  second  wife Lani since  1998. They have two sons, Angel (b. September  292006and  Vincenzo (b. June 282008 ). Thought:  3 / 4  Metallica band members have  8  children Action:  Search Action Input: How many kids does Robert Trujillo have? Observation:  2 Thought:  4 / 4  Metallica band members have  10  children Final  Answer:  10

Snippet 2.  Executing the ReAct agent in the terminal


Code snippet 2 calculates that the band Metallica has four members. It then searches each band member to request the total number of children and adds them up. Finally, it returns the total number of children as the final answer.


Using ReAct prompts in practice requires understanding that you have to constantly resend previous prompts / responses ( and prune additional generated content ) , as well as setting up the model with appropriate examples / instructions. Please refer to the notebook14 hosted in the GoogleCloudPlatform Github repository , which shows actual LLM inputs and outputs in more detail and provides more detailed examples.


Automatic word prompting project


At this point you may realize that writing prompt words can be complicated. Wouldn’t it be nice if we could automate this process ( writing prompt words to write prompt words ) ? Indeed, there is a way to do this: Automatic Prompt Engineering (APE) . This approach15 not only alleviates the need for human input, but also enhances the performance of the model in a variety of tasks.


You feed the prompt a model to generate more prompts. Evaluate them, possibly modify the good ones. Repeat.


For example, you could use autosuggest word engineering to help train a chatbot for a merchandise t- shirt online store. We want to find all the different ways that customers can order merchandise t- shirts for a band.


1. Write the prompt words that will generate the output variant. In this example, I use gemini-pro to generate 10 instructions. See Table 15 :


Prompt word


We have a band merch t- shirt online store, and to train a chatbot we need various ways to order: " One Metallica t- shirt, size S" . Generate 10 variations, keeping the same semantics but keeping the same meaning.


Output


  1. I would like to purchase a Metallica T -shirt in a smaller size .


  2. Can I order a smaller Metallica T- shirt?


  3. I'd like to buy a small Metallica T -shirt.


  4. A Metallica T- shirt, size small, thank you.


  5. I want a small Metallica T- shirt.


  6. I'd like to order a small Metallica T- shirt.


  7. Small Metallica T- shirt, one, thank you.


  8. I'm looking for a small Metallica T -shirt.


  9. Can you give me a small Metallica T- shirt?


  10. A small Metallica T- shirt, thanks.


Table 15.  Generating prompt words using LLM


2. Evaluate all instruction candidates by scoring them according to a metric of your choice. For example, you can use BLEU ( Bilingual Evaluation Substitute ) or ROUGE ( Recall-Oriented Summary Evaluation Substitute ) .


3. Select the command candidate with the highest evaluation score. This candidate will be the final prompt word that you can use in your software application or chatbot. You can also adjust the selected prompt word and evaluate it again.


Code hint words


Gemini  focuses primarily on text-based prompts, including prompts for writing return codes. Let’s jump into  Vertex AI Studio  to test these prompts and look at some code examples.


Tips for writing code


Gemini  can also act as a developer and help you write code in any programming language. As a developer, this can speed up your code writing process.


Let's say you have a folder on your machine with hundreds of files that need to be renamed. Renaming the files one by one would take a lot of time. You know some  Bash  scripting and can write a script to automate this, but that might also take some time. So let's write a prompt. You can write prompts in  Gemini  , a public chatbot , or if you're more concerned about confidentiality, you can write these prompts in  Vertex AI Studio  inside your  Google Cloud  account. The advantage of Vertex AI Studio  is that you can configure parameters such as temperature.


name


1_bash_rename_files_in_folder


Target


Write a  Bash  prompt to rename files in a folder.


Model


gemini-pro


temperature


0.1


Top-K


N/A


Token Restrictions


1024


Top-P


1


Prompt word


Write a  Bash  snippet that asks for a folder name, then gets the contents of that folder and renames all the files by adding  "draft"  in front of the file name.


Output


```bash


#!/bin/bash


Ask for folder name


echo "Enter the folder name: "    read folder_name


// Check if the folder exists


if [ ! -d "$folder_name" ]; then    echo "Folder does not exist."    exit 1    fi



// Get the list of files in the folder


files=("$folder_name"/*)


// Add    "draft_"  to the front of each file name to rename


for file in "{files[@]}"; do new_file_name="draft_(basename "file")"mv"file" "$new_file_name"    done


echo "Files renamed successfully."


Table 16.  Writing code using Gemini


This code looks good it’s even commented! However, since large language models cannot reason and will repeat training data, be sure to read and test your code first.  


The moment we've all been waiting for: Does it actually work?  


Let's try this out first with a test folder that contains just a few files that need to be renamed from `filename.txt`  to  `draft_filename.txt` .  


1.  Copy the output of Table  16  (excluding  the ```bash```  text wrapper) and paste it into a new file named `rename_files.sh` . 


2.  Open a terminal window and type: `./rename_files.sh` . It will ask you to enter a folder name, such as  `test` , then press enter.  


3.  The script appears to have run successfully. You will see the message: `Files renamed successfully.` 


When you look in the  `test`  folder, you will find that all the files have been successfully renamed to  `draft_filename.txt` .  


Success! 


Hints to explain the code


As a developer, when you work in a team, you may need to read other people's code. Gemini  can also help you with this task. Let's take the code output of Table  16  as an example, remove the comments, and then let the big language model explain its function. See Table  17 :  


name


1_bash_explain_the_code


Target


Write a prompt word that explains  the Bash  code.


Model


gemini-pro


temperature


0.1


Top-K


N/A


Token Restrictions


1024


Top-P


1


Prompt word


Explain the following Bash code to me


Output


```bash


#!/bin/bash


echo "Enter the folder name: "  


read folder_name  



if [ ! -d "$folder_name" ]; then  


     echo "Folder does not exist."  


     exit 1  


fi   



files=("$folder_name"/*)  



for file in "${files[@]}"; do  


     new_file_name="draft_$(basename "$file")"  


     mv "$file" "$new_file_name"  


done   



echo "Files renamed successfully."  


``` |   


OUTPUT  |  This  Bash  script performs the following tasks:  


1.  User input:  


    •  Use the  `echo`  and  `read`  commands to prompt the user to enter a folder name. The entered folder name is stored in the  `folder_name`  variable.  



2.  Folder existence check:  


    •  Use the  `[ ! -d "$folder_name" ]`  condition to check if a folder with the specified name exists. If the folder does not exist, an error message ( “Folder does not exist.” ) is displayed and the script terminates with exit code  1  .  



3.  Get the file list:  


    •  Use the command  `files=( "$folder_name"/* )`  to list all the files in a specified folder. This will create an array called  `files`  containing the paths to all the files in the folder.  



4.  File rename:  


    •  Use a  `for`  loop to iterate over each file in the    `files`  array.   



    •  For each file, a new file name is generated using the command    `new_file_name="draft_$(basename "$file")"`  . The new file name is created by adding the string  "draft_"  to the front of the original file name .  



    •  Then rename the file using the  `mv "$file" "$new_file_name"`  command to move the file to the new file name.  



5.  Success message:  


    •  After all files are renamed, a success message ( “Files renamed successfully.” ) is displayed, indicating that the renaming operation was completed successfully.  |  


Table 17.  Tips for explaining the code


Translation code hints  


 The Bash  code in Listing  16  works well. However, this script could be more reusable if it could prompt for a file name. Ideally, it should run as a standalone application with a user interface. As a starting point, Python  is better suited for ( web ) applications than  Bash  . Large language models can help translate code from one language to another. See the example in Listing  18  : 


name


1_python_rename_files


Target


Write a prompt word that translates  Bash  code into  Python  code.


Model


gemini-pro


temperature


0.1


Top-K


N/A


Token Restrictions


1024


Top-P


1


Prompt word


Translate the following  Bash  code into a    Python  code snippet.


Output


```bash


#!/bin/bash


echo "Enter the folder name: "  


read folder_name  



if [ ! -d "$folder_name" ]; then  


     echo "Folder does not exist."  


     exit 1  


fi   



files=("$folder_name"/*)  



for file in "${files[@]}"; do  


     new_file_name="draft_$(basename "$file")"  


     mv "$file" "$new_file_name"  


done   



echo "Files renamed successfully."  


``` |   


Output  | ```python  


import os   


import shutil  



#Get  the folder name from the user  


folder_name = input("Enter the folder name: ")  



Check if the folder exists  


if not os.path.isdir(folder_name):  


     print("Folder does not exist.")  


     exit(1)  



#Get  a list of all files in a folder  


files = os.listdir(folder_name)  



#Traverse  the files  


for file in files:  


     #Create  a new file name with the prefix  "draft_"   


     new_file_name = f"draft_{file}"  


     #Move  the file to a new name   


     shutil.move(os.path.join(folder_name, file),  


                os.path.join(folder_name, new_file_name))  



#Print  success message  


print("Files renamed successfully.")  


```


Table 18. Tips for translating Bash code into Python code

Read and review the code. Copy and paste the output from the prompt into a new file: `file_renamer.py` . Test the code by opening a terminal window and executing the following command: `python file_renamer.py` .  


Note: When prompted for Python code in  Vertex AI  's Language Studio  , you will need to click the "Markdown" button. Otherwise, you will receive plain text, lacking the proper indentation required for  Python  code.  


Tips for debugging and reviewing code  


Let's manually edit the code in Listing  18.  It should prompt the user to enter a file name prefix and convert this prefix to uppercase characters. See Code Snippet  3  for an example, but unfortunately, it now returns a  Python  error!  


//pythonimport  osimport  shutil folder_name =  input ( "Enter the folder name: "prefix =  input ( "Enter the string to prepend to the filename: "text = toUpperCase(prefix) if  not  os.path.isdir(folder_name):   print ( "Folder does not exist." )    exit( 1 )  files = os.listdir(folder_name) 
for  file  in  files:   new_filename =  f" {text} _ {file} "  shutil.move(os.path.join(folder_name, file),  os.path.joi(folder_name, new_file_name))  print ( "Files renamed successfully."


Code Snippet  3.  A corrupted Python  script


Oops! It looks like there was an error:


Traceback (most recent call last): File "/Users/leeboonstra/Documents/test_folder/rename_files.py", line 7, in <module> text = toUpperCase(prefix) NameError: name 'toUpperCase' is not defined


Code Snippet  4.  I broke the  Python code


Let’s see if we can ask the big language model to debug and review the code. See Table  19 :


name


1_python_debug_code


Target


Write a hint word for debugging and reviewing  Python  code.


Model


gemini-pro


temperature


0.1


Token Restrictions


1024


Top-K


N/A


Top-P


1


Prompt word


The following  Python  code returns an error:


Traceback (most recent call last):  


   File "/Users/leeboonstra/Documents/test_folder/rename_files.py", line 7, in <module>  


     text = toUpperCase(prefix)  


NameError: name 'toUpperCase' is not defined  


Please debug the problem and explain how to improve the code.


Python


copy


import os   


import shutil  



#Get  the folder name from the user  


folder_name = input("Enter the folder name: ")  


prefix = input("Enter the string to prepend to the filename: ")  


text = toUpperCase(prefix)  



Check if the folder exists  


if not os.path.isdir(folder_name):  


     print("Folder does not exist.")  


     exit(1)  



#Get  a list of all files in a folder  


files = os.listdir(folder_name)  



#Traverse  the files  


for file in files:  


     #Create  a new file name with the prefix  "draft_"   


     new_filename = f"{text}_{file}"  


     #Move  the file to a new name   


     shutil.move(os.path.join(folder_name, file),  


                os.path.join(folder_name, new_file_name))  



#Print  success message  


print("Files renamed successfully.")  


```


Output


The code calls the  `toUpperCase`  function to convert  `prefix`  to uppercase, but this function is not defined. To solve this problem, you can use the    `upper()`  method of the string class to convert the given string to uppercase. The modified code is as follows:  


```Python   


import os   


import shutil  



#Get  the folder name from the user  


folder_name = input("Enter the folder name: ")  


prefix = input("Enter the string to prepend to the filename: ")  


text = prefix.upper()  



Check if the folder exists  


if not os.path.isdir(folder_name):  


     print("Folder does not exist.")  


     exit(1)  



#Get  a list of all files in a folder  


files = os.listdir(folder_name)  



#Traverse  the files  


for file in files:  


      Create a new file name with a prefix   


     new_filename = f"{text}_{file}"  


     #Move  the file to a new name   


     shutil.move(os.path.join(folder_name, file),  


                os.path.join(folder_name, new_filename))  



#Print  success message  


print("Files renamed successfully.")  


In addition, the code can be improved as follows:


  1. The extension of the new file name is not preserved. It is better to extract the extension from the original file name and append it to the new file name.


  2. The code assumes that there are no spaces in the folder names. However, it is better to handle spaces gracefully by replacing them with underscores or hyphens.


  3. String interpolation using  f  -string syntax is more readable and concise than the traditional  +  operator.


  4. The code does not handle errors that may occur during the renaming process. It is best to wrap  the shutil.move  call in  a try...except  block to catch potential errors.


Here is the improved code incorporating these suggestions:


Python


copy


import os   


import shutil  



#Get  the folder name from the user  


folder_name = input("Enter the folder name: ")  


prefix = input("Enter the string to prepend to the filename: ")  



Check if the folder exists  


if not os.path.isdir(folder_name):  


     print("Folder does not exist.")  


     exit(1)  



#Get  a list of all files in a folder  


files = os.listdir(folder_name)  



#Traverse  the files  


for file in files:  


     #Get  the file extension   


     ext = os.path.splitext(file)[1]   


     #Create  a new file name with a prefix and file extension  


     new_filename = f"{prefix}_{os.path.splitext(file)[0]}{ext}"  


     #Build  the full path of the original file and the new file  


     old_path = os.path.join(folder_name, file)  


     new_path = os.path.join(folder_name, new_filename)  


     #Try  to move the file   


     try:  


         shutil.move(old_path, new_path)   


     except Exception as e:  


         print(f"An error occurred while moving the file {file}: {e}")  



#Print  success message  


print("Files renamed successfully.")  


(The response was truncated because the token limit was reached. If a longer response is required, try increasing the token limit.)


Table  19.  Tips for debugging and reviewing  Python  code


Awesome! Not only did it tell me how to fix the problem, it also found more errors in my code and provided solutions. The last part of the hint also gave general suggestions for improving the code.


About multimodal prompt words


Code hints still use the regular big language model. Multimodal hints are a separate problem that refers to techniques where you use multiple input formats to teach a big language model, rather than just relying on text. This can include a combination of text, images, audio, code, or even other formats, depending on the capabilities of the model and the task at hand.


Best Practices


Finding the right cue words can take some trial and error.  The Language Studio  in Vertex AI  is a great place to test cue words, and you can test them against a variety of models.


Use the following best practices to become an expert in cue word engineering.


Provide examples


The most important best practice is to provide (one-shot / few-shot) examples in the prompt words . This works very well because it is a powerful teaching tool. These examples demonstrate the desired output or similar responses, allowing the model to learn from them and adjust its generation accordingly. It’s like giving the model a reference point or a target to improve the accuracy, style, and tone of its responses to better match your expectations.


Simple design


Hints should be concise, clear, and easy for you and the model to understand. Generally speaking, if it is confusing to you, it is likely to be confusing to the model as well. Try to avoid using complex language and don't provide unnecessary information.


Example:


Before: " I'm currently visiting New York and would like to learn more about great places to go. I have two  3-  year-olds with me. Where should we go on vacation? "


Revised: " A travel guide for tourists describing great places to visit in Manhattan, New York with a  3-  year-old. "


Try to use verbs that describe actions. Here are some examples: Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort, Summarize, Translate, Write.


Clarify output requirements


Be specific about the desired output. Concise instructions may not be enough to guide a large language model, or may be too general. Providing specific details in the prompts, either through system or contextual prompts, can help the model focus on relevant content and improve overall accuracy.


Example:


Do: Generate a  3-  paragraph blog post about the top  5  gaming consoles . The blog post should be informative and engaging, and written in a conversational style.


Bad idea: Create a blog post about gaming consoles.


Prefer directives over constraints


Instructions and constraints are used to guide the output of a large language model.


•  Instructions specify the expected format, style, or content of the response. It directs the model what it should do or generate.


•  Constraints are limits or bounds on the response. They restrict what the model should not do or avoid.


A growing body of research suggests that focusing on positive instructions in cues is more effective than relying on a large number of constraints. This approach fits with the human tendency to prefer positive instructions over lists of what not to do .


Instructions directly communicate the desired outcome, whereas constraints may leave the model guessing about what is allowed. It provides flexibility and encourages creativity within defined boundaries, whereas constraints may limit the model's potential.


Constraints can still be valuable in some cases, such as preventing a model from generating harmful or biased content, or when a strict output format or style is required.


If possible, use positive instructions: instead of telling the model what not to do, tell it what it should do. This avoids confusion and improves the accuracy of the output.


Do: Create a blog post about the top  5  gaming consoles. Discuss only the console name, company that made it, year, and total sales.


Bad move: Create a blog post about the top  5  gaming consoles. Don’t list the games.


As a best practice, prioritize using directives first to clearly state what you want the model to do, and use constraints only when necessary for safety, clarity, or to meet specific requirements. Experiment and iterate to test different combinations of directives and constraints to find what works best for your specific task, and document those.


Controlling the Maximum Token Length


To control the length of the generated large language model responses, you can set a maximum token limit in the configuration , or explicitly request a specific length in the prompt. For example: " Explain quantum physics in a tweet-length message. "


Using variables in prompt words


To reuse prompts and make them more dynamic, you can use variables in prompts that can change for different inputs. For example, as shown in Table  20  , a prompt that provides facts about a city. Instead of hard-coding the city name in the prompt, use a variable. Variables save time and effort by avoiding repetition. If you need to use the same information in multiple prompts, you can store it in a variable and then reference the variable in each prompt. This makes sense when you integrate the prompt into your own application.


Prompt word


Variable    {city} = "Amsterdam"


Hint: You are a travel guide. Tell me a fact about the city: {city}



Output


Amsterdam is a beautiful city filled with canals, bridges and narrow streets. Known for its rich history, culture and nightlife, it is a great tourist destination.


Table  20.  Using variables in prompt words


Try different input formats and writing styles


Different models, model configurations, cue formats, wording choices, and submission methods may produce different results. Therefore, it is important to experiment with cue properties such as style, wording, and cue type: zero-shot, few-shot, system cue.


For example, a goal is to generate a text prompt about the revolutionary gaming console  Sega Dreamcast  , which can be formulated as a question, statement, or instruction, resulting in different outputs:


•  Question: “What is the Sega Dreamcast  and why is it a revolutionary console?


•  Statement: "The Sega Dreamcast  is a sixth-generation video game console released by Sega in  1999.  It ..."


•  Instructions: Write a paragraph describing  the Sega Dreamcast  console and explaining why it was revolutionary.


Mixed categories in few-shot classification tasks


In general, the order of the few-shot examples should not affect the results. However, when performing classification tasks, make sure to mix the possible response categories in your few-shot examples. This is because otherwise you risk overfitting to the specific order of the examples. By mixing the possible response categories, you ensure that the model learns to identify the key features of each category, rather than simply memorizing the order of the examples. This will make the model more robust and generalizable on unseen data.


A good rule of thumb is to start with  6  few-shot examples and test the accuracy from there.


Adaptive model update


It’s important for you to keep up with changes to model architectures, new data, and features. Try newer model versions and adjust your prompts to better take advantage of new model features. Tools like  Vertex AI Studio  are great for storing, testing, and documenting different versions of your prompts.


Try different output formats


In addition to the input format of the hint words, you can also experiment with the output format. For non-creative tasks (such as extracting, selecting, parsing, sorting, ranking, or classifying data), try to have the output returned in a structured format (such as  JSON  or  XML ).


 There are some benefits to returning a JSON  object when extracting data from a prompt . In practice, I don't need to manually create this  JSON  format, the data is already returned in sorted order (very handy when dealing with datetime objects), but most importantly, by using the prompt  JSON  format, it forces the model to create structure and limits illusions.


To summarize the advantages of  JSON  output: •  Always returns the same style


•  Focus on the data you want


•  Reduce the possibility of hallucinations


•  Make it relationship-aware


•  You get the data type


•  You can sort them


 Table 4  in the Few-shot Prompt Words section shows an example of how structured output is returned.


JSON  fix


While returning data in  JSON  format has many advantages, it is not without its disadvantages. The structured nature of JSON  , while beneficial for parsing and use in applications, requires more tokens than plain text , resulting in increased processing time and rising costs. Additionally, the verbosity of JSON  can easily exhaust the entire output window, which is particularly problematic when the generation is abruptly truncated due to token limits. This truncation often results in invalid  JSON , missing critical closing brackets or square brackets, rendering the output unusable. Fortunately, tools like the  json-repair  library on  PyPI  can be very helpful in situations like this. The library intelligently attempts to automatically repair incomplete or malformed  JSON  objects, making it a critical tool when processing  JSON  generated by large language models , especially when dealing with potential truncation issues.


Usage Mode


Using structured  JSON  as output is a great solution, as we’ve seen several times in this article. But what about the input? While  JSON  is great for structuring the output generated by a big language model, it can also be very useful for structuring the input you provide it. This is where  JSON  schema comes in. A JSON  schema defines the expected structure and data types for  JSON  input. By providing a schema, you provide a clear blueprint of your data to the big language model, helping it focus on relevant information and reducing the risk of misinterpreting the input. Additionally, a schema can help establish relationships between different data, or even make the big language model time-aware by including date or timestamp fields with a specific format .


Here is a simple example:


Let's say you want to use a big language model to generate descriptions for products in your e-commerce catalog. Instead of just providing a free-text description of the product, you can define the properties of the product using a  JSON  schema:


json copy


{ "type": "object", "properties": { "name": { "type": "string", "description": "Product name" }, "category": { "type": "string", "description": "Product category" }, "price": { "type": "number", "format": "float", "description": "Product price" }, "features": { "type": "array", "items": { "type": "string" }, "description": "Key features of the product" }, "release_date": { "type": "string", "format": "date", "description": "Date the product was released" } }


You can then provide actual product data that conforms to this schema as a  JSON  object:


json copy

{ "name": "Wireless Headphones", "category": "Electronics", "price": 99.99, "features": ["Noise cancellation", "Bluetooth 5.0", "20-hour battery life"], "release_date": "2023-10-27" }


By preprocessing the data and providing only the schema and data, rather than the full document, you provide the big language model with a clear understanding of the product's attributes, including its release date, making it more likely to generate an accurate and relevant description. This structured input approach guides the big language model to focus on relevant fields and is particularly valuable when processing large amounts of data or integrating the big language model into complex applications.


Collaborate with other CueWord engineers to experiment


If you are in a situation where you need to try to come up with good cue words, you may want to have multiple people try. When everyone follows best practices (such as those listed in this chapter), you will see a difference in performance between all the different cue word attempts.


Best Practices of Thinking Chain


For thought chaining prompts, it is necessary to put the answer after the reasoning, because the generation of the reasoning will change the token that the model obtains when predicting the final answer .


When using thought chaining and self-consistency, you need to be able to extract the final answer from the cue word and separate it from the reasoning.


The thought chain prompt word is based on greedy decoding, predicting the next word in the sequence based on the highest probability assigned by the language model. In general, when using inference to arrive at the final answer, there is likely to be only one correct answer. Therefore, the temperature should always be set to  0 .


Record various prompt word attempts


This last tip was mentioned earlier in this chapter, but we can’t overemphasize its importance: Keep detailed records of your cue word attempts so you can learn over time what works and what doesn’t.


The cue output may vary between models, sampling settings, and even different versions of the same model. In addition, even for the same cue for the same model, there may be minor differences in the format and wording of the output sentence. (For example, as mentioned earlier, if two tokens have the same predicted probability, ties may be broken randomly. This may affect subsequent predicted tokens .)


We recommend creating a  Google  Sheet using Table  21  as a template. The advantage of this approach is that you have a complete record when you inevitably need to revisit your cue word work, whether to start over in the future (you’ll be surprised how much you can forget in a short period of time), to test cue word performance on different model versions, or to help debug future bugs.


In addition to the fields in the table, it is helpful to record the version (iteration) of the prompt, a field to capture whether the result was "OK/NOT OK/SOMETIMES OK" , and a field to capture feedback. If you are lucky enough to be using  Vertex AI Studio , save your prompt (using the same name and version listed in the documentation) and follow the hyperlink to the saved prompt in the table. This way, you can re-run the prompt with a single click.


When working on a retrieval enhancement generation system, you should also capture specific aspects of the  RAG  system that affect the content of the inserted cue words, including queries, block settings, block output, and other information.


Once you feel your prompt is close to perfect, bring it into your project codebase. In the codebase, keep the prompt in a separate file from the code to make it easier to maintain. Finally, ideally, your prompt is part of an operationalized system, and as a prompt engineer, you should rely on automated testing and evaluation procedures to understand how well the prompt generalizes to the task.


Cue engineering is an iterative process. Design and test different cues, analyze and record the results. Refine the cues based on the performance of your model. Keep trying until you get the output you expect. When you change your model or model configuration, go back and continue trying the cues you used previously.


name


[ Name and version of the prompt ]


Target


[ One sentence explanation of the goal of this attempt ]


Model


[ Name and version of the model used ]


temperature


[ value between 0 - 1  ]


Token Restrictions


[ number ]


Top-K


[ number ]


Top-P


[ number ]


Prompt word


[ Write out the complete prompt ]


Output


[ write output or multiple outputs ]


Table  21.  Template for recording prompt words


Summarize


This white paper discusses cue word engineering. We learned various cue word techniques, such as:

•  Zero-shot prompt words


•  Few sample prompt words


•  System prompt words


•  Role prompts


•  Contextual cues


•  Backsliding cues


•  Thinking Chain COT


•  Self-consistency SC


•  Thinking Tree TOT


• ReAct

We even looked at how to automate the cue words.


The white paper then discusses the challenges of generative  AI  , such as the problems that can arise when the cue words are insufficient. Finally, we share best practices on how to become a better cue word engineer.