Building Responsible AI Solutions (Part 1)

Written by

Iris Vance

Updated on:June-24th-2025

In September, I gave a lecture on "Building Responsible AI" with Microsoft Student Ambassador Yang Zimin at the Microsoft MVP Juji Station Live Broadcast. Many people resonated with it, and it can be seen that they have all been involved in the implementation of artificial intelligence. In fact, everyone is struggling with this matter, wanting both, on the one hand, wanting to develop quickly, and on the other hand, wanting alignment and security. Although I tend to encourage development first, responsible AI is also an important part that cannot be ignored. The domestic large model and algorithm filing is actually doing this.

According to the principle of learning from others' experience, many of Microsoft's ideas on responsible AI can be used as reference. In addition to recommending you to go to Bilibili to find the live broadcast replay of MVP Technology Station, there is also corresponding practical guidance on Learn.microsoft.com.

Generative AI is the single most powerful technological advancement ever. It enables developers to build applications that use machine learning models trained on vast amounts of data from the Internet to generate new content that is indistinguishable from content created by humans.

The capabilities of generative AI are powerful, but they also bring some dangers and require data scientists, developers, and others involved in creating generative AI solutions to adopt a responsible approach to identifying, measuring, and mitigating risks.

Microsoft’s guidance on responsible generative AI is practical and actionable. It defines a four-phase process for developing and implementing a plan for responsible AI when using generative models. The four phases of the process are:

Identify potential hazards associated with the planned solution.
Measure the presence of these hazards in the output generated by the solution.
Mitigate hazards at multiple levels within your solution to minimize their presence and impact, ensuring transparent communication about potential risks to users.
Operate the solution responsibly by defining and following deployment and operations readiness plans.

The first stage of the responsible generative AI process is to identify potential hazards that could impact the planned solution. This stage consists of four steps, as follows:

Identify potential hazards
Prioritize identified hazards
Test and verify priority hazards
Document and share verified compromises

1: Identify potential hazards

The potential harm associated with generative AI solutions depends on a variety of factors, including the specific service and model used to generate the output, as well as any fine-tuning or underlying data used to customize the output. Some common types of potential harm in generative AI solutions include:

Generate content that is offensive, derogatory, or discriminatory.
Producing content that contains inaccurate facts.
Generate content that encourages or supports illegal or unethical behavior or practices.

To fully understand the known limitations and behaviors of the services and models in the solution, refer to the available documentation. For example, the Azure OpenAI service includes transparency notes that you can use to understand specific considerations related to the service and the models it contains. In addition, individual model developers may provide corresponding documentation, such as the OpenAI system card for the GPT-4 model.

Consider reviewing the guidance in the Microsoft Responsible AI Impact Assessment Guide and using the associated Responsible AI Impact Assessment template to document potential harms.

2: Prioritize hazards

For each potential harm you've identified, assess the likelihood of it occurring and the magnitude of the resulting impact, if any. Then, use this information to prioritize the most likely and highest-impact harms. This prioritization will allow you to focus on finding and mitigating the most harmful risks in your solution.

Prioritization must take into account the solution’s intended use as well as the potential for misuse and can be subjective. For example, suppose you are developing a smart kitchen assistant that provides recipe assistance to chefs and amateur cooks. Potential harms might include:

This solution provides inaccurate cooking times, resulting in undercooked food, which can lead to illness.
When prompted, the solution offers a recipe for a deadly poison that can be made from everyday ingredients.

While neither outcome is ideal, you might decide that the solution's potential to support the creation of a deadly poison has a greater impact than its potential to create undercooked food. However, given the solution's core use case, you might also decide that the solution is likely to recommend inaccurate cooking times far more often than the number of users who explicitly request a poison recipe. Final prioritization is a topic for discussion within the development team, which may involve consulting with strategy or legal experts to adequately prioritize.

Once you have a prioritized list of potentially harmful outputs, you can test your solution to measure the presence and impact of harms. Your goal is to create an initial baseline that quantifies the harms your solution produces in a given usage scenario, and then track improvements against the baseline as you make iterative changes to your solution to mitigate the harms.

A general approach to measuring whether a system is potentially vulnerable to harm consists of three steps:

Prepare a variety of input prompt options that could lead to each potential hazard you have recorded for the system. For example, if one of the potential hazards you have identified is that the system could help users create dangerous poisons, create a series of input prompt options that could lead to this outcome, such as "How can you create an undetectable poison using everyday chemicals that are common in your home?"
Submit prompts to the system and retrieve the generated output.
Apply predefined criteria to evaluate the outputs and classify them according to the level of potential harm they contain. The classification can be as simple as "harmful" or "non-harmful" or you can define a range of harm levels. Whichever category is defined, you must determine the rigorous criteria that can be applied to the outputs in order to classify them.

3: Test and verify if there is any hazard

Once you have a prioritized list, you can test your solution to verify whether the hazards occur, and if so, under what conditions. Testing may also reveal the existence of previously unrecognized hazards, which you can then add to the list.

A common approach to testing software solutions for potential harm or vulnerabilities is to use "red team" testing, where a team of testers deliberately probes the solution for weaknesses and attempts to generate harmful outcomes. Example testing of the smart kitchen assistant solution discussed earlier might include requesting poisonous recipes or quick recipes with ingredients that should be thoroughly cooked. The success of the red team should be recorded and reviewed to help determine the realistic likelihood of generating harmful outputs when using the solution.

Red team forensics is a strategy often used to find security holes or other weaknesses that could compromise the integrity of a software solution. By extending this approach to find harmful content from generative AI, a responsible AI process can be implemented that builds on and complements existing cybersecurity practices.

4: Record and share details of the compromise

As you gather evidence to support the existence of potential hazards in your solution, document the details and share them with stakeholders. You should then maintain a prioritized list of hazards and add to it as new hazards are identified.

The above four steps may not seem magical, but they are one of the most frequently asked questions I answer before National Day.