Real-world data (RWD) paired with artificial intelligence techniques can almost instantly analyze how well drugs work in diverse subpopulations once they hit the market. RWD can also help answer doctors’ questions about applying trial results to underrepresented patients while informing decisions on running subsequent trials in specific populations based on real-world outcomes. In addition, RWD can help rapidly generate personalized evidence to inform treatment decisions for individual patients
Tackling diversity in clinical trials: Enrollment and subpopulation-specific studies
This focus on diversity encompasses two main aspects. The first is increasing diverse population enrollment in clinical trials, ensuring they are more representative of the broader population that will ultimately use the approved treatments. FDA has been increasingly vocal in encouraging clinical trial diversity in recent years. As of 2024, the agency plans on requiring sponsors of phase 3 or other pivotal trials must submit Diversity Action Plans (DAPs) outlining enrollment goals for underrepresented groups, rationales and strategies to meet those goals.
The second dimension to clinical diversity relates to subpopulation-specific trials, a topic whose legacy is obscured by the controversial legacy of BiDil, a heart failure drug approved by the FDA in 2005 specifically for African Americans. A combination of two older generic drugs, hydralazine and isosorbide dinitrate already in use for heart failure, BiDil attracted criticism for not being truly new or innovative. Upon the drug’s approval, “American medicine managed to take a small step forward and a giant step backward at precisely the same time,” The New York Times quipped in a review of the book “Race in a Bottle: The Story of BiDil and Racialized Medicine in a Post-Genomic Age” by Jonathan Kahn.
BiDil’s tainted legacy
“The drug didn’t actually work that well” in the subpopulation it was indicated for, recalled Brigham Hyde, MD, CEO of Atropos Health. “And when it came time to roll it out, there were some side-effect concerns. That label ended up getting pulled.” This case remains a cautionary tale in the pursuit of targeted treatments for diverse subpopulations.
BiDil’s clinical trials only included African Americans, so its efficacy in other populations was unknown. By contrast, techniques like RWD can help identify opportunities to enroll diverse populations by revealing where patients from different backgrounds receive care, allowing researchers to tailor recruitment strategies accordingly.
RWD can also enable comparisons of treatment safety and efficacy across demographic groups in the real world. By analyzing data from electronic health records, claims databases and patient registries that capture a broad patient population, researchers can assess whether a drug’s benefits and risks vary by race, ethnicity, age, gender, or other factors. RWD can thus potentially uncover disparities or unexpected benefits missed in initial trials.
Real-world data can identify opportunities to enroll diverse populations
Real-world data can identify opportunities to enroll diverse populations. While increasing diversity in clinical trials is a critical goal, real-world data (RWD) offers a complementary approach to generate evidence for underrepresented populations. By analyzing RWD, researchers can identify where patients from diverse backgrounds are receiving care (e.g., community clinics vs. academic centers) and tailor recruitment strategies accordingly. For instance, RWD might reveal a high prevalence of diabetes within a specific Hispanic population, leading researchers to prioritize those communities for a new diabetes medication trial.
“As a drug hits the market, those populations can be analyzed almost instantly to figure out how well it’s working in these subpopulations,” explained Hyde. “The old version of that was phase four trials, but that was still very site-centric. Now, RWD can provide insights into whether the drug works equally well across different racial groups, age demographics, or regions – potentially uncovering disparities or unexpected benefits that were missed in the initial trials.
RWD can inform trials on specific patient populations
Real-world evidence (RWE) derived from RWD can replicate trials on specific patient populations, generating personalized insights to guide treatment decisions. “If [a therapy] performs dramatically differently in that population, it’s probably a good idea to run a subsequent trial on that population based on that evidence,” Hyde said. “As a treating physician, you can factor that into your decision-making, review process or screening process when you’re looking at care.”
Such goals are not exactly new. “The idea of using data to inform personalized care has been around for decades but was limited by the inability to rapidly analyze data to generate evidence relevant to a specific patient,” Hyde explained. “Now we can rapidly replicate trials on real patients and generate personalized evidence in under a day, enabling doctors to factor real-world data insights into screening and treatment decisions at the point of care.”
The promise and potential perils of large language models in healthcare
Large language models (LLMs) have exploded in the public consciousness in recent years. In healthcare, LLMS have shown promise for improving screening for undiagnosed diseases. That capability can assist in identifying patients with rare conditions that may otherwise go long undetected. “If I can shorten the time to diagnose these patients and get them treatment, that’s a big win,” Hyde said.
He addressed a common point of confusion in healthcare, where diagnosis and treatment sometimes get lumped together. “Now I’ve diagnosed them, what is the evidence that we should treat them with A or B?” he asked. Determining the appropriate treatment requires considering the patient’s specific characteristics, such as demographics, background, and history. The bar for making an informed treatment decision can be higher than the initial diagnosis.
A high bar for medical LLMs
Thus, the bar for using LLMs in healthcare contexts is high. “I’m worried a little bit about people saying, ‘We’re gonna throw EMR data into an LLM and it’s going to work,’” Hyde said. “If you drop a 9% hallucination rate on that, I’m not so sure.”
Atropos’ strategy to use LLMs is “super constrained” and separate from its core data analysis technology, Hyde said. “All the data analysis is done by our core tech, which was developed at Stanford and has been published on.”
The potential for problems of LLMs could lead “backlash” and “calls for standards and regulations around healthcare AI.” Hyde said that Atropos aims to avoid such matters by focusing on “quality, transparency, and accuracy to earn clinician trust for decision support.”
One key metric Atropos tracks is the net promoter score (NPS), which measures the likelihood of a clinician recommending their AI-powered insights to a colleague. “I think one of the reasons we focus heavily on user experience is that it’s measured by Net Promoter Score, which essentially asks, ‘Would you recommend this to a friend?'” Hyde explained.
Building trust with medical AI
Healthcare organizations using Atropos’s services report an average Net Promoter Score (NPS) in the 40s, Hyde said. This score significantly outperforms industry benchmarks within the healthcare sector, which typically see NPS scores in the teens.
Hyde also recommends a credit score–inspired system that rates the appropriateness of data used for specific research questions. Such a score can promote transparency and helps healthcare professionals assess data suitability, even where the underlying data remains partially opaque. “One concern that often comes up with all this real-world data stuff is the idea of ‘garbage in, garbage out,’” Hyde said. “People wonder how they can know if the data is any good and what errors are being introduced.”
Companies working on applying AI to clinical and drug development contexts will need to aim to build “trust and confidence — even among the most skeptical users,” Hyde said. This means providing greater transparency into the data and methods underlying AI-driven insights, going beyond the traditional peer-review process of standard publications. When reading some medical publications, “you’re sort of hoping the peer-review process caught any issues with the underlying data,” Hyde said. “We’re trying to shine a light on that by providing a ‘credit score’ that indicates the appropriateness of the data for the specific question being asked.”
Filed Under: clinical trials, Drug Discovery, machine learning and AI, Regulatory affairs