Prompt for reasoning model

Written by

Silas Grey

Updated on:July-17th-2025

origin

The previous prompts were all aimed at the SFT Model. We clearly expressed our ideas and introduced Few-shots to demonstrate the desired "flavor". All of this was to guide the big model to "get our ideas" and output content that meets our needs.

GPT-o1, Deepseek-R1, and other RL/RLHF models are here. Based on the SFT model, they introduced the RL training method, giving the model Q & A and letting the model try to reason how to go from Q ⇒ A. As a result, the model parameters of the SFT model changed, and "reasoning ability" emerged.

When the "model parameters" change, the "model universe" also changes. The way prompts are written must also change accordingly. The previous Chain-of-thoughts and Few-shots writing methods will "get worse" on this type of RL/RLHF model. These methods that once helped guide the thinking process conflict with the "reasoning ability" learned by the reasoning model. Our guidance is no longer a help to it, but a hindrance.

After reading the paper and studying the examples of how to write on the Internet, everyone said that "it is enough to clearly and directly state what you want." Throwing a sentence to the inference model can get a very good response, so is there a way to "slightly" tweak the prompt to get a "better" response?

How to fix it? Let's look back at the training method of RL Model. What do the Q&A input during training represent? My understanding is that Q represents the "starting point", the background information of the current task ; A represents the "end point", the ideal response we expect . When we use the inference model, we should also provide information according to its training mode: only the starting point and the end point, no process .

The characteristic of the reasoning model is that it can reason. We should not intervene but follow its nature and give it Q and A.

Starting point (Q)

The starting point is our task information. For different tasks, the information provided has different richness and clarity. If you draw an axis, the left end can be "implicit" and the right end can be "explicit". The clearer and richer the task description, the closer it is to "explicit", and vice versa.

End point (A)

The end point is the result we expect. For different tasks, we expect to receive high-quality responses. How can we describe this "high quality"? If we draw an axis to cover as many common task scenarios as possible, I think we can try the ladder of abstraction: the top is "abstract" and the bottom is "concrete". The expectation of the result is expressed at the level of abstraction.

Example (generated by Deepseek)

I have had headaches recently, especially in the afternoon, and I am sensitive to light. List the three most likely disease diagnoses, sort them by probability, and indicate the additional examination items that need to be added.
I need to replace the front tire of my 2023 Tesla Model Y, which has 19-inch wheels and is currently broken down on a suburban highway. Please provide a 10-step operation guide with a tool list and safety precautions, with -> between steps
Based on 'quantum entanglement' and 'Zen cases', a cognitive model is constructed to explain interpersonal relationships in the digital age. It must meet the following requirements: graphical expression and compatibility with Habermas's communication theory.
Please explain what is "presence"?
...

This is just a preliminary thinking framework, which needs to be improved through subsequent testing and comparison. We look forward to your feedback and suggestions.