
This plot (using randomly generated data as a visual demonstration) represents high-dimensional embedding vectors (interpreted as points) of medical terms projected into a 3D space using Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction. Each point represents a medical terms, with colors indicating different categories: red for cancer, blue for heart disease, green for diabetes, purple for Alzheimer’s, and yellow for Parkinson’s. The spatial relationships between points mirror semantic similarities in the original high-dimensional space. [[Made with UMAP-Learn and Scikit-Learn]]
A recent study underscores this trend, revealing that 85% of patients use social media to seek out health information. Recognizing this evolution, clinical research professionals are actively exploring new methods for gathering patient feedback. These efforts rely on emerging technologies like artificial intelligence (AI) and natural language processing (NLP) so that clinical trial stakeholders can more effectively extract and analyze patient-voiced concerns regarding drug safety and product benefits.
Tapping the ever-expanding potential of patient-generated data
A vast amount of valuable information remains untapped. This data resides in a variety of sources, including patient support programs (PSPs), market research programs, website chatbots, call center phone calls and CRM systems. Additionally, social media, online forums, and discussion groups offer a wealth of valuable information. Recognizing this potential, clinical research stakeholders are increasingly utilizing automation and AI to mine and analyze data from these consumer and customer-generated sources. To understand the transformative impact of these technologies on data collection and analysis, it’s crucial to examine the current state of drug safety monitoring and the limitations of traditional data collection methods.
This interactive illustrative plot (using random rather than actual data) represents high-dimensional embedding vectors of medical terms projected into 3D space using UMAP dimensionality reduction. Each point represents a medical concept, with colors indicating different categories shown in the legend.
The challenges of traditional drug safety monitoring
The current pharmacovigilance (PV) system hinges on healthcare professionals reporting adverse events (AEs) directly, along with data from patient registries and regulatory databases. Yet this reliance on structured data capture often leads to underreporting or incomplete information. Moreover, traditional reporting methods can be cumbersome and time-consuming due to their manual nature and fragmented systems.
While current efforts are valuable, they might not fully capture the complexities of drug safety events. Recent data show that the life sciences industry has experienced persistent visibility issues. The Food and Drug Administration (FDA) reports that the FDA Adverse Event Reporting System likely captures only a fraction of all adverse drug reactions (ADR). Estimates suggest a capture rate between 1% and 10%, meaning a significant majority of AEs go unreported. This vast underestimation underscores the need for more innovative PV strategies to improve our understanding of drug safety risks.
Mining the digital landscape for patient safety signals
The evolving digital world demands new approaches to PV. Experts recognize the need for alternatives that integrate with this changing environment to ensure patient safety. Unlike traditional methods relying on structured data from registries and databases, online channels offer a wealth of unstructured patient feedback. Social media platforms, online forums and discussion groups provide a rich source of real world patient experiences containing valuable, authentic details about potential AEs.
Patient support programs often involve the collection of real world data, which can complement traditional clinical trial data. This real world evidence is valuable for assessing the safety and effectiveness of medications in diverse patient populations and under varied conditions. The combination of healthcare professional reports and patient-reported data from support programs contributes to a more comprehensive safety profile for pharmaceutical products. This can lead to a more thorough understanding of the risk-benefit profile of medications.
Extracting insights from patient safety data in these online channels presents the unique challenge of extracting unstructured data. Unlike traditional databases with standardized formats, social media conversations often lack uniformity. Inconsistencies, errors or missing information can pose challenges in interpreting and relying on the data for pharmacovigilance purposes. The use of emojis, slang and colloquialisms complicate the process of identifying potential AEs. However, advances in technology empower PV teams to analyze and organize this vast amount of unstructured data, converting it into a valuable resource for patient safety monitoring.
How automated intelligence can extract patient data
One approach to undertaking this challenge is to use automation such as AI and NLP. These advanced analytical technologies uncover potential safety signals that traditional methods may overlook. With these capabilities, organizations can develop algorithms, word banks and specific terms or patterns to identify patient safety events. Taking advantage of conceptual models bridges the gap between medical terminology and safety language. These models analyze patterns and word proximity within patient data, enabling the automated classification of AEs.
Recent studies exploring the use of AI and NLP in PV have shown promising results. In one study with a chatbot database, 78% of adverse event (AE) data was successfully identified from over 292,000 virtual agent messages. Although the number of AEs identified may appear small, the capability to process a large volume of data and extract safety signals from a new source is highly significant.
These benefits extend to Patient Support Programs (PSPs), which pose challenges due to their varied formats, both structured and unstructured, across multiple languages. Analyzing PSPs requires a different automation approach compared to other data sources. AI, NLP and Optical Character Recognition technologies can effectively review tens of thousands of PSP records, significantly reducing the reliance on manual review. Processing PSP data efficiently requires a combination of automation techniques. In one instance, this approach achieved 90% efficiency, requiring human review for only 10% of the 45,000 records processed monthly. Rapid analysis also allows for timely identification of potential safety signals, adverse events or emerging patterns, supporting proactive pharmacovigilance efforts – which in turn can enable proactive interventions, such as updating safety guidelines, issuing warnings, or adjusting treatment plans to enhance patient safety.
Similarly, in another recent case study, these technologies analyzed a substantial social media dataset encompassing 7.7 million posts across 300 sources, 91 languages and 38 countries. The results remarkably yielded over 100,000 potentially relevant safety events, significantly exceeding the capacity of manual methods.
Benefits of AI and NLP in pharmacovigilance processes
These case studies demonstrate the superiority of AI and NLP for expediting safety signal detection from unstructured data. Implementing AI and NLP allows organizations to improve safety event detection strategies and reduce the time needed to gather patient safety information.

Deepanshu Saini
Taking advantage of the transformative potential of advanced technologies, PV is poised for a paradigm shift. By integrating AI and NLP, clinical trial stakeholders can unlock the vast potential of unstructured patient data gleaned from online platforms and social media. This data offers invaluable insights into potential ADRs that may elude traditional, structured data collection methods. The ability to analyze this data facilitates earlier detection of safety signals, enabling swifter intervention and improved patient outcomes. Additionally, the large amount of patient-generated data promotes a comprehensive understanding of drug safety, including diverse patient experiences. This approach shows great promise for the future of pharmacovigilance, leading to a more informed and comprehensive evaluation of drug safety.
Deepanshu Saini is the director of program management and IQVIA.
Filed Under: clinical trials, Data science, Drug Discovery, Regulatory affairs