How an AI company selected by Y Combinator created production-level prompts and achieved 46% of work order automation

How does AI technology drive the revolution of work order automation? Revealing how Parahelp, a company invested by Y Combinator, achieves high efficiency with production-level prompts.
Core content:
1. How Parahelp helps the Captions platform achieve 46% work order automation
2. The core challenge of production-level prompt design: dealing with uncertainty
3. How to deal with complex business decisions through structured modeling
Most people’s understanding of AI prompts is still at the level of chatting with AI ChatBot. But real commercial AI systems use a completely different approach.
Parahelp [1], which was invested by Y Combinator S24 , is a typical example. They helped the AI video creation platform Captions build a customer service system, which was deployed in 7 days and achieved 46% of customer service ticket automation [2] . At the same time, customer satisfaction was improved, response time was shortened, and processing costs were reduced. Eli Winderbaum, head of customer experience at Captions, said that customers could hardly tell that it was AI that was replying.
The secret behind this lies in the hundreds of lines of prompts in Parahelp [3] . In order to optimize these prompts, they invested hundreds of hours in repeated polishing. Several important partners of Y Combinator specifically discussed Parahelp's prompt design method in a podcast [4] , using it as a typical case of AI engineering practice. Parahelp has served well-known companies such as Perplexity, Framer, Replit, and ElevenLabs. These practical experiences give us the opportunity to take a peek at the difference between toy prompts and production-level prompts.
Core Challenge: Remaining Reliable in the Face of Uncertainty
The first challenge in designing production-level prompts is the incompleteness of information. In the customer service scenario of Parahelp, a complete prompt contains about 1.5K tokens of dynamic information - message history, historical case experience, company policies, etc. The model can access some relevant information, but rarely all of it.
This reality constraint requires that prompts must explicitly handle uncertainty. Parahelp prompts repeatedly emphasize this constraint:
Make sure your description never assumes any information, variables, or tool call results, even if you have a good idea of what the tool call results will be. Make sure your plan never includes or guesses about information that is not explicitly stated in the policy document.
The prompts presented by the Parahelp team provide extremely detailed guidance for the AI’s planning process:
### How to Plan - When planning the next step, make sure it is only the goal of the next step and not the overall goal of the ticket or user. - Make sure the plan always follows the procedures and rules of the # Customer Service Agent Policy document ### How to Create Steps - A step will always include the name of the action (tool call), a description of the action, and the parameters required for the action. It will also include the goal of the specific action.
These instructions are not simple suggestions, but strict operating rules, each of which targets specific error modes that AI may exhibit in real-world environments.
Structured Modeling of Complex Decisions
Real business scenarios involve complex conditional branches. Take refund processing as an example, the system must consider all paths such as purchase date, country, plan type, etc., because the refund rules vary based on these parameters.
Parahelp introduced the concept of "model RAM (working memory)" - the number of paths that the model can reliably handle. When decision branches exceed the model's processing capacity, the team decomposes complexity through architectural design instead of forcibly increasing the complexity of prompt words.
An important finding is that o1-medium (now using o3-medium) is the first model to perform well in this type of prompt word evaluation. This shows that production-level prompt words have special requirements for model capabilities, and not all models are capable of this complex conditional reasoning task.
This kind of planning prompt word faces two core difficulties:
1. The complete prompt word contains about 1.5K tokens of dynamic information... It is difficult to make the model understand that it should not assume that it has complete information (or predict what data will be returned by the tool call).
2. The plan must include all potential paths based on what the tool call returns and the different outcome rules. For refund requests, the plan must consider all paths based on purchase date, country, plan type, etc.
These two challenges precisely capture the core dilemma facing production-grade AI systems: making complex, multi-path decisions with incomplete information.
To address this challenge, they use a variable reference system:<>
Indicates the tool call result.{{}}
Represents a specific policy. This allows the model to plan across multiple tool invocations without requiring the tool output.
This system has extremely detailed requirements for step creation:
More importantly, the prompt words strictly constrain the AI's thinking process:
- action_name should always be the name of a valid tool - description should be a brief description of why the action is needed, a description of the action to be taken, and any variables that the action requires from other tools - Make sure you always emphasize in your descriptions that you are answering questions/troubleshooting steps
Technical implementation: XML structure and conditional logic
Parahelp’s solution makes extensive use of XML structured syntax. o1/o3 is the most important breakthrough, followed by the use of XML if blocks with conditions. This makes the model more strict, but performs better because it leverages the programming logic capabilities that the model has acquired from pre-training.
A key design decision was to disable else statements: the model was intentionally not allowed to use 'else' blocks, only 'if' blocks. This required the model to define explicit conditions for each path, a design that significantly improved performance in evaluations.
The following is a complete plan example (see Appendix 1) that shows this structured approach in action:
This example perfectly demonstrates several key features of enterprise-level prompt words: multi-layered nested conditional logic, clear tool calls, and variable reference system (< helpcenter_result > and
{{troubleshooting_info_name_from_policy_2}}
), and explicit conditional definitions for each branch.
It is worth noting that the design philosophy of this type of example emphasizes:
IMPORTANT: This example plan is just to give you an idea of how to structure your plan... it is not a hard and fast rule or how you should structure every plan - it uses variable names to give you an idea of how to structure your plan, think about possible paths and use
This description reveals an important feature of production-level prompts: they provide a framework for thinking rather than a rigid template. AI needs to reason flexibly within this framework rather than mechanically executing fixed steps.
Engineering development process
Unlike randomly written prompts, production-grade prompts require a rigorous engineering process. The experience of the Parahelp team shows that it is common to spend hundreds of hours optimizing prompts with only a few hundred lines. Most of the time spent optimizing prompts is not actually spent on writing, but on designing an evaluation system, running tests, discovering edge cases, verifying in real environments, and iterating based on the results.
This rigorous approach has clear success metrics: Customer Support has a clear success metric - the percentage of tickets resolved in full. Every iteration must show improvement on this core metric.
Summary of design principles
The core principles of professional prompt words can be extracted from Parahelp's practice:
Clear order of thought - specify the processing steps of the model Structured format - Use markdown and XML to organize information Role Definition - Assign clear roles (such as "Manager") Key Instruction Emphasis - Use words such as "important" and "always" to highlight key requirements
The second Parahelp prompt (Manager prompt, see Appendix 2) further reinforces these design principles. It reiterates the same structural requirements and emphasizes some key constraints:
- Make sure you always emphasize in your description of answering questions/troubleshooting steps
This repetitive emphasis reflects another characteristic of enterprise-level prompts: ensuring that key instructions are followed through redundancy. In a real business environment, a small deviation in AI can lead to significant differences in customer experience.
The transition from smart to reliable
Industrial-grade prompts represent a fundamental shift from "making AI behave smart" to "making AI behave reliably." They transform complex real-world problems into decision-making frameworks that AI can consistently handle through precise conditional logic, variable reference systems, structured syntax, and rigorous iterative development.
This engineering methodology not only ensures reliability in large-scale commercial applications, but also reveals a deeper insight: truly useful AI systems require not more "intelligence" but better "engineering." As we move from demonstration to production, from display to practicality, design thinking must shift from pursuing stunning effects to ensuring stable performance.