Building Responsible AI Solutions (Part 2)

Written by

Clara Bennett

Updated on:June-24th-2025

Once you have We can only use the word "mitigate" here instead of "eliminate" , and guess why ?

Mitigating potential hazards in generative AI solutions involves a layered approach, where mitigation techniques can be applied at each of five levels, as follows:

data
Model
Security System
Meta Hints and Engineering Layers
User Experience

1: Data

If the data fed into the entire application at the beginning is dirty data, with a lot of hatred, pornography, prejudice and other problems, we don't need to continue to expand, there is no hope, just restart a copy. Then the next question is how do I judge whether the hundreds of thousands to hundreds of millions of data in front of me are dirty data? If you understand it systematically, there is a science called data governance, but if you just want to make some simple judgments, then human participation in data quality inspection and letting different large language models to assist is a relatively easy path to achieve.

2: Model level

The model level consists of generative AI models and is the core of the solution. For example, the solution may be built around models such as GPT-4/ChatGLM/LlaMa.

Mitigations that can be applied at the model level include:

Choose a model that fits the purpose of your intended solution. For example, while GPT-4 may be a powerful and versatile model, in a solution that only needs to classify small, specific text inputs, a simpler model may provide the required functionality while reducing the risk of generating harmful content.
Fine-tune the base model using your own training data so that the responses it generates are more likely to be relevant to your solution scenario and fit its scope.

3: Security system level

The safety system layer includes platform-level configurations and features that help mitigate harm. For example, Azure OpenAI services include support for content filters that apply criteria to suppress prompts and responses based on categorizing content in four potential harm categories (hate, sex, violence, and self-harm) into four severity levels (safe, low, medium, and high).

Other security system-level mitigations may include abuse detection algorithms to determine if the solution is being systematically abused (e.g., through a large number of automated requests from bots) and alert notifications to quickly respond to potential system abuse or harmful behavior.

4: Meta Hints and Engineering Layer

Meta-prompts are also called system prompts, which means that no matter what you ask, you will be asked to write a paragraph first. The engineering layer focuses on the construction of prompts submitted to the model. Harm mitigation techniques that can be applied at this layer include:

Specifies meta-hints or system inputs that define behavioral parameters for the model.
Application Tip Engineering adds grounding data to input tips, maximizing the likelihood of relevant, non-harmful output.
Use the Retrieval Augmentation Generation (RAG) approach to retrieve contextual data from trusted data sources and include it in prompts.

5: User interaction and experience layer

The user experience layer includes the software application through which users interact with the generative AI model and the documentation or other user materials that describe the use of the solution to users and stakeholders.

Designing application user interfaces to restrict input to specific subjects or types, or applying input and output validation can reduce the risk of potentially harmful responses.

The documentation and other descriptions of the generative AI solution should be appropriately transparent about the capabilities and limitations of the system, the models it is based on, and any potential hazards that the mitigation measures you have implemented may not always address. In fact, so far, if you rely on large language models at the user level, it is almost impossible to avoid hazards 100%. After all, large language models are themselves based on probability theory. This is why I said at the beginning that our measures are called "mitigation" rather than elimination.

Once you have identified potential hazards, developed a method to measure their presence, and implemented mitigations in your solution, you are ready to release your solution. Before you release, there are some considerations to help you ensure a successful release and subsequent operations.

Complete pre-launch review

Before releasing a generative AI solution, identify the various compliance requirements for your organization and industry and ensure that the appropriate teams have a chance to review the system and its documentation. Common compliance reviews include:

Regulations
privacy
Security
Accessibility

Release and operate solutions

A successful launch requires some planning and preparation. Follow these guidelines:

Design a phased delivery plan to release the solution to a limited set of users first. This approach allows you to gather feedback and identify issues before releasing it to a wider audience.
Create an incident response plan that includes an estimate of the time it will take to respond to an unexpected incident.
Create a rollback plan that defines the steps to restore the solution to a previous state if an incident occurs.
Implement functionality that blocks harmful system responses as soon as they are detected.
Implement the ability to block specific users, applications, or client IP addresses when system abuse occurs.
Implement a way for users to provide feedback and report issues. In particular, enable users to report generated content as "inaccurate," "incomplete," "harmful," "offensive," or otherwise problematic.
Track telemetry data to enable you to determine user satisfaction and identify functionality gaps or usability challenges. The telemetry data collected should comply with privacy laws as well as your own organization's policies and commitment to user privacy.

Effective tools to mitigate harmful responses of generative AI models in Microsoft Azure OpenAI are content filters, blacklists, etc.