From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Processes

Written by
Iris Vance
Updated on:June-24th-2025
Recommendation

Explore the new tool for drug regulatory compliance, how the QA-RAG model optimizes industry processes with high accuracy.

Core content:
1. The application and advantages of the QA-RAG model in drug regulatory compliance
2. The combination and progress of generative AI and RAG methods in chatbots
3. QA-RAG model performance evaluation and its potential impact in the pharmaceutical industry

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process

summary

Regulatory compliance in the pharmaceutical industry requires a lot of human resources to deal with complex and cumbersome guidelines. To address these challenges, our research introduces a chatbot model that leverages generative AI and a retrieval-augmented generation (RAG) approach. The chatbot is designed to search for guidance documents relevant to the user's query and provide answers based on the retrieved guidelines. Given the inherent need for high reliability in this field, we propose a Question-Answering Retrieval-Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model significantly outperforms all other baselines in terms of accuracy, including traditional RAG methods. This paper details the structure and performance evaluation of QA-RAG, emphasizing its potential for regulatory compliance in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.

https://huggingface.co/datasets/Jaymax/FDA_Pharmaceuticals_FAQ

https://arxiv.org/abs/2402.01717

introduction

Advances in chatbots

Recent advances in generative AI have significantly enhanced the capabilities of chatbots. The applications of these chatbots powered by generative AI are being explored across industries [Bahrini et al., 2023; Castelvecchi, 2023; Badini et al., 2023], with the pharmaceutical industry being a notable area of ​​focus. In the field of drug discovery, recent studies have shown that chatbots powered by generative AI can play an important role in advancing drug discovery [Wang et al., 2023; Savage, 2023; Bran et al., 2023]. Such advances not only streamline the discovery process, but also pave the way for chatbots to suggest new research ideas or approaches, enhancing the collaborative nature of research. In the healthcare sector, chatbots have proven particularly effective in providing personalized support, which can lead to better health outcomes and more effective treatment management [Ogilvie et al., 2022; Abbasian et al., 2023]. These chatbots can provide timely medication reminders, deliver information about potential side effects, and even assist in scheduling doctor consultations.

The need for chatbot guidance on drug regulation

Another key area in the pharmaceutical industry where generative AI can be fully utilized is in ensuring compliance with regulatory guidelines. For industry practitioners, navigating the complex and extensive guidelines provided by agencies such as the U.S. Food and Drug Administration (FDA) and the European Medicines Agency (EMA) is often a daunting and time-consuming task. The large number of guidelines, coupled with their intricate details, can make it difficult for companies to quickly find and apply relevant information. This often leads to increased costs as teams spend valuable time navigating through large repositories of guidelines. A recent study highlighted the financial impact of complying with regulatory guidelines [Crudeli, 2020]. The study found that compliance efforts can consume up to 25% of the budget of a medium or large pharmaceutical manufacturing operation. Given these challenges, the pharmaceutical industry needs a more efficient way to navigate and interpret regulatory guidelines. Large language models (LLMs) can help address this issue. However, despite their extensive pre-training, LLMs often face inherent limitations in acquiring knowledge that is not included in their initial training data. Particularly in the highly specialized and detailed field of pharmaceutical regulatory compliance, it is clear that this domain-specific knowledge is not fully contained in the training material. As a result, LLMs may not be sufficient to accurately answer questions in this area.

Retrieval-augmented generation (RAG) models stand out as a bridge to this gap. Not only does it leverage the intrinsic knowledge of these models, it also draws additional information from external sources to generate responses. The RAG framework is able to do this, as shown in the work of [Wen et al., 2023] and [Yang et al.]. The research in [2023] shows how to cleverly combine rich background information with the answer to ensure a comprehensive and accurate response to the query. These studies highlight the versatility of RAG in a variety of applications, from complex story generation to theorem proof.

Furthermore, there is evidence that RAG models outperform typical sequence-to-sequence models and certain retrieval and extraction architectures, particularly in knowledge-intensive natural language processing tasks. Despite the advances in RAG, we recognize that the accuracy of traditional RAG approaches may be insufficient in the domain of regulatory compliance, which requires domain-specific, highly specialized information. Therefore, we introduce Question-Answering Retrieval-Augmented Generation (QA-RAG). Designed for highly specific domains that require specialized knowledge, the QA-RAG model precisely aligns regulatory guidelines with actual implementation, streamlining the compliance process in the pharmaceutical industry.

Core Overview

Background

  1. Research Questions
    : The problem addressed in this article is how to utilize generative AI and retrieval-augmented generation (RAG) methods in the pharmaceutical industry to improve the efficiency and accuracy of regulatory compliance.
  2. Research Difficulties
    :The research difficulties of this issue include: the complexity and detailed nature of pharmaceutical industry regulatory guidelines, the limitations of traditional RAG methods in dealing with highly specialized information, and how to improve retrieval efficiency while ensuring accuracy.
  3. Related Work
    :Related work on this issue includes the application of generative AI in drug discovery and healthcare, the application of RAG models in complex story generation and theorem proving, and its advantages in knowledge-intensive NLP tasks.

Research Methods

This paper proposes a QA-RAG model to address regulatory compliance issues in the pharmaceutical industry. Specifically,

  1. Overall structure : The QA-RAG model retrieves documents using the answers provided by the fine-tuned LLM agent and the original query. Half of the documents are retrieved through the answers provided by the fine-tuned LLM agent, and the other half through the original query. The system then re-ranks the retrieved documents to keep only the most relevant documents to the question.


  2. Document preprocessing and similarity search : Dense retrieval methods such as Facebook AI Similarity Search (FAISS) are used to extract documents. Documents are converted to text using OCR technology and split into multiple chunks. Documents are embedded using LLM embedder.

  3. Dual-track retrieval : Combine the answers from the fine-tuned LLM agent and the original query for document retrieval. This approach not only expands the search scope but also captures a wider range of relevant information.

  4. Fine-tuning process : The FDA's official question-answering dataset was used for fine-tuning. ChatGPT 3.5- Turbo and Mistral-7B were selected as the basic LLM models. LoRA technology was used in the fine-tuning process to efficiently adjust the model parameters.

  5. Reranking : Rerank the retrieved documents using the BGE reranker, evaluate each document’s relevance to the query, and keep the most relevant documents.

  6. Final answer generation : Use the ChatGPT-3.5-Turbo model as the final answer agent and generate the final answer through the few-shot prompting technique.

Experimental design

  1. Dataset
    : Fine-tuning was performed using the FDA's official question-answering dataset, which collected 1,681 question-answer pairs. The dataset is divided into a training set (85%), a validation set (10%), and a test set (5%).
  2. Experimental setup
    : In the experiment, the number of documents retrieved each time is fixed to 24, and the top 6 most relevant documents are screened out in the post-processing stage. The performance of different methods in context retrieval and answer generation is compared.
  3. Baseline selection
    : Including methods that only use original queries, multiple query problems, and HyDE methods.

Results and Analysis

  1. Comparison of Reranking and Scoring Agents
    : The re-ranker outperforms the scoring agent in almost all methods in terms of contextual precision and recall, indicating the advantage of the re-ranker in accurately identifying relevant documents.
  2. Contextual Retrieval Performance Evaluation
    : The QA-RAG model combines the answers from the fine-tuned LLM agent and the original query, achieving the highest contextual precision (0.717) and recall (0.328). The HyDE method performs second, while the method using only the original query performs the worst.
  3. Answer Generation Performance Evaluation
    : The QA-RAG model performs well in terms of precision (0.551), recall (0.645), and F1 score (0.591), close to the top three in contextual retrieval performance.
  4. Ablation studies
    : The method using only hypothesized answers is slightly lower than the full model in contextual accuracy, but significantly higher than the method using only the original query. This shows the key role of hypothesized answers in improving accuracy.

Ethical Statement

In the development and application of the QA-RAG model, we emphasize its role as a complementary tool for professionals in the pharmaceutical field. While the model improves efficiency and accuracy in navigating complex guidelines, it is designed to augment, not replace, human expertise and judgment.

The datasets used to train and evaluate the models include publicly accessible documents from the U.S. Food and Drug Administration (FDA) and the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) and adhere to all applicable data privacy and security protocols.

Overall conclusion

The QA-RAG model proposed in this paper demonstrates its effectiveness in the field of regulatory compliance in the pharmaceutical industry. By combining generative AI and RAG methods, the QA-RAG model is able to efficiently retrieve relevant documents and generate accurate answers. This model not only improves the efficiency and accuracy of the compliance process, but also reduces the reliance on human experts, laying the foundation for future applications in the pharmaceutical industry and other fields. Future research should continue to evaluate and improve the model to cope with changing data and industry practices.

Paper Evaluation

Advantages and innovations

  1. Significantly improved accuracy
    : The QA-RAG model demonstrates significant accuracy improvement in comparative experiments, surpassing all other baseline methods, including traditional RAG methods.
  2. Combining Generative AI and RAG methods
    :The model cleverly combines generative AI with the retrieval-augmented generation (RAG) method, utilizing the powerful generation capabilities of generative AI and the retrieval capabilities of the RAG method.
  3. Highly customized for the field
    : Designed for the highly specialized areas of the pharmaceutical industry, the QA-RAG model precisely aligns regulatory guidance with actual implementation, streamlining the compliance process.
  4. Dual search mechanism
    : By combining user questions and hypothetical answers generated by a fine-tuned LLM for document retrieval, the search scope is expanded and a wider range of relevant information is captured.
  5. Fine-tuned LLM
    : Generate hypothetical answers using LLM fine-tuned on domain-specific data, significantly improving the precision and accuracy of retrieved documents.
  6. Multiple evaluation indicators
    : We use multiple evaluation indicators such as the Ragas framework and BertScore to comprehensively evaluate the accuracy of context retrieval and answer generation.
  7. Publicly available
    : The research team releases their work publicly to allow for further research and development.

Shortcomings and reflections

  1. Long-term impacts require ongoing assessment
    : Like any emerging technology, the long-term impact of the QA-RAG model across industries requires ongoing evaluation and improvement.
  2. Adaptability and robustness
    : The need to ensure that models remain adaptable and robust in the face of changes in data and industry practices.
  3. Improved model performance
    : Future development should continue to focus on improving the performance of the model to ensure it keeps pace with evolving generative AI techniques.
  4. Ethical Statement
    : When developing and applying the QA-RAG model, emphasis is placed on its role as a complementary tool for professionals, designed to augment rather than replace human expertise and judgment.

Key questions and answers

Question 1: How does the QA-RAG model leverage generative AI and RAG methods in the document retrieval process?

The QA-RAG model adopts a dual-track retrieval strategy that combines generative AI and RAG methods. The specific steps are as follows:

  1. Document preprocessing and similarity search
    : Use dense retrieval methods (such as Facebook AI Similarity Search, FAISS) to extract documents. The documents are converted to text using OCR technology and split into multiple chunks. The documents are embedded using LLM embedder.
  2. Dual-track search
    : Combine the answers from the fine-tuned LLM agent and the original query for document retrieval. Half of the documents are retrieved through the answers provided by the fine-tuned LLM agent, and the other half through the original query. This approach not only expands the search scope, but also captures a wider range of relevant information.
  3. Reorder
    : The system re-ranks the retrieved documents, keeping only the most relevant documents to the question. Re-rank the retrieved documents using the BGE re-ranker, evaluating each document's relevance to the query and keeping the most relevant documents.

Question 2: In the QA-RAG model, what is the role of the fine-tuned LLM agent in document retrieval and answer generation?

  1. Document Retrieval
    : The hypothetical answers generated by the fine-tuned LLM agent are used to retrieve documents. Specifically, half of the documents are retrieved through the answers provided by the fine-tuned LLM agent, and the other half are retrieved through the original query. This approach not only expands the search scope, but also captures a wider range of relevant information.
  2. Answer Generation
    :The final answer is generated through a few-shot prompting technique, using the ChatGPT-3.5- Turbo model as the final answer agent. The fine-tuned LLM agent is able to provide information highly relevant to pharmaceutical regulatory guidelines when generating hypothetical answers, thereby guiding subsequent document retrieval and generation of the final answer.

Question 3: How does the QA-RAG model perform in experiments? What are its advantages over other baseline methods?

  1. Contextual retrieval performance
    : The QA-RAG model combines the answers from the fine-tuned LLM agent and the original query, achieving the highest contextual precision (0.717) and recall (0.328). In comparison, the HyDE method performs second, while the method using only the original query performs the worst.
  2. Answer generation performance
    : The QA-RAG model performs well in terms of precision (0.551), recall (0.645), and F1 score (0.591), close to the top three in contextual retrieval performance.
  3. Comparison of Reranking and Scoring Agents
    : The re-ranker outperforms the scoring agent in almost all methods in terms of contextual precision and recall, indicating the advantage of the re-ranker in accurately identifying relevant documents.
  4. Ablation studies
    : The method using only hypothesized answers is slightly lower than the full model in contextual accuracy, but significantly higher than the method using only the original query. This shows the key role of hypothesized answers in improving accuracy.

Overall, the QA-RAG model significantly improves the efficiency and accuracy of regulatory compliance in the pharmaceutical industry by combining generative AI and RAG methods, reducing dependence on human experts.