Microsoft: Prompt evolves autonomously, saying goodbye to handwriting~

Written by

Jasper Cole

Updated on:July-16th-2025

Introduction to PromptWizard

PromptWizard is a new, fully automatic discrete prompt optimization framework released by Microsoft, which is implemented through a self-evolution and self-adaptation mechanism.
Through a feedback-driven criticism and synthesis process, an effective balance is achieved between exploration and exploitation, iteratively improving prompt instructions and context examples to generate human-readable prompts for specific tasks.
The framework performs well on 45 tasks and achieves superior performance even with limited training data, small LLMs, and different LLM architectures.

Problems that PromptWizard solves

Limitations of Manual Hint Engineering

Solution: PromptWizard solves the time-consuming and domain-specific problems of manual prompt engineering by automating prompt optimization. It uses a self-evolution mechanism to enable LLM to generate, criticize, and refine its own prompts and examples, and continuously improve through iterative feedback and synthesis.

The shortcomings of existing optimization strategies

Solution: Existing continuous and discrete prompt optimization methods either require additional neural network training or lack feedback mechanisms when exploring the prompt space. PromptWizard overcomes the stochasticity and inefficiency of these methods by introducing a feedback-driven criticism and synthesis process.

Generation of task-specific prompts

Solution: PromptWizard improves model performance and interpretability by iteratively refining prompt instructions and context examples to generate specific prompts that meet task requirements.

PromptWizard Architecture Flow

Problem description and initial prompt instructions
The PromptWizard first receives a problem description and an initial prompt instruction. For example, in a math problem-solving task, the initial prompt might be: "Let's think step by step to find a solution to this math problem."
Generate instruction variants
Based on the problem description and initial prompt instructions, PromptWizard generates prompt variations using predefined cognitive heuristics or thinking styles. These heuristics guide LLM to create different perspectives on the problem, ensuring the diversity of prompt instructions.
Performance Evaluation
Next, PromptWizard evaluates the performance of the generated variant prompts using a scoring mechanism. The score is based on the performance of each prompt on a small batch of training examples. The scoring mechanism can be a traditional metric such as the F1 score, or LLM as an evaluator.
Feedback and refinement
After selecting the best performing variant prompts, PromptWizard introduces a unique feedback mechanism through its Critique component. The Critique reviews where the prompt succeeds and fails, providing targeted feedback for focused improvement of the prompt.
Synthesis and Optimization
Finally, PromptWizard’s synthesis component uses critical feedback to refine the best prompts. It rephrases and enhances instructions based on the feedback, producing optimized prompts that are more specific to the task.
Identifying diverse examples
PromptWizard then focuses on identifying a diverse set of candidate examples to enhance the effectiveness of prompts by extracting candidate examples from the dataset and using a scoring mechanism to evaluate the effectiveness of the current prompt, classifying them into positive and negative examples.
Sequential Optimization
Different from most existing prompt optimization methods, PromptWizard adopts a sequential optimization approach to optimize prompt instructions and a small number of examples simultaneously. Through a criticism and synthesis process, PromptWizard dynamically enhances prompt quality and task performance.
Self-generated reasoning and verification
After optimizing prompts and a few examples, PromptWizard further improves model performance by integrating Chain of Thought (CoT) reasoning. PromptWizard automatically generates detailed reasoning chains for each selected few examples and uses LLM to check the coherence and relevance of the examples.
Integration of task intent and expert roles
To improve task performance, PromptWizard integrates task intent and expert roles into prompts. This ensures that the model remains relevant in domain-specific tasks and guides the model to apply relevant methods.

Three ways to improve prompt quality

No training data, and no desire to use contextual examples in prompts
There is no training data, but you want to use contextual examples in the prompt. Here are two steps
Generating synthetic data
Optimizing prompts with synthetic data
We have training data and hope to use contextual examples in prompts, allowing the model to generate, evaluate, and improve prompt words and generated examples by itself, and improve output quality through continuous feedback.