Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

Why scientific AI needs clear lines of sight — especially for fields like drug development

By Brian Buntz | December 3, 2024

abstract geometry

[Adobe Stock]

Today’s large language models can be as unreliable as they are eloquent. Their tendency to fabricate facts and lose the thread makes them risky tools for scientific research, especially in highly regulated industries like pharmaceuticals and chemistry. They also struggle to provide sources and will fabricate a bogus academic journal without batting an eye.

Speaking of journals, a growing number of papers are documenting the problem. A study published in the Journal of Medical Internet Research (2024), for instance, documents hallucination rates across major language models, ranging from 28.6% for GPT-4 to 91.3% for Google’s Bard (now Gemini). Meanwhile, an Arxiv preprint systematically analyzed 333 unique mentions of AI hallucination across 14 academic databases, discovering significant inconsistencies in how researchers conceptualize and measure this phenomenon.

86%
of researchers and clinicians express concern that AI could cause critical errors or mishaps
Insights 2024: Attitudes toward AI

Reducing science friction

In any event, the practical risks for scientific applications is considerable, especially in fields like drug development where accuracy is paramount. Elsevier’s “Insights 2024: Attitudes toward AI” study quantifies this concern: 86% of researchers worry AI will “cause critical errors or mishaps,” and about seven in ten (71%) respondents expect generative AI tools’ results to be based on high-quality, trusted sources only.

Retrieval Augmented Generation (RAG) offers a path to greater trustworthiness and accuracy and a way to go beyond the black-box dynamic of using off-the-shelf chatbots. In essence, RAG grounds a generative AI model with an external data source (typically a vector database). The approach helps generative models to cite data sources, and thus helps humans understand the provenance of the data used to inform a generative AI model’s response. “You need explainability and the ability to validate hypotheses to avoid critical errors,” said Joe Mullen, director of data science and professional services for SciBite at Elsevier.

Precision engineering for reliable AI

Implementing RAG effectively requires precise technical controls and human oversight. “You can’t just rely on efforts from commoditized providers,” Mullen said. The need for explainability and reliability aligns with the aforementioned Elsevier’s survey finding that 71% of respondents demand high-quality, trusted source validation.

96%
of researchers and clinicians expect AI will change how education is delivered in universities and medical schools
Insights 2024: Attitudes toward AI

A robust evaluation framework must incorporate quantitative metrics—like precision and recall in information retrieval—and qualitative assessments from subject matter experts. “You need to have humans in the loop to evaluate how performant these systems are against real scientific problems,” Mullen said. “We understand the importance of marrying the content, technology, and expertise.” The goal is to ensure proper expert oversight to fine-tune these AI systems. In drug development, this approach helps “improve the efficiency of the entire drug development process, whether that’s preclinical to clinical, and also post-marketing surveillance,” Mullen added.

There’s a balancing act involved in integrating both internal and external information sources. Data architectures need to be flexible enough to segregate or combine information based on specific requirements. “There are scenarios where you want to keep data sets separate,” Mullen noted, describing how companies might need to query proprietary and public datasets independently. At the same time, the optimal application of AI in research relies on the integration of reliable scientific data, encompassing both high-quality external sources and internal datasets within secure computational frameworks.

SciBite, for instance, has the ability to handle data from different sources by treating them uniformly. “For example, if you enrich both data sources with the same ontologies or vocabularies, you can integrate them using our technology components,” Mullen said. “As long as you can ensure the data’s accuracy and treat data from different sources consistently—verifying it, for example—you can retrieve and use that data in the same manner, regardless of its origin.”

Reproducibility in the era of stochastic AI

The scientific community has long grappled with reproducibility issues. A landmark 2016 survey in Nature revealed that over 70% of scientists had failed to reproduce other researchers’ experiments, with more than half unable to replicate their own work. This challenge persists today and is compounded by the inherent variability in generative AI models. “By its nature, gen AI is stochastic,” Mullen notes. “Asking it the same question multiple times might yield slightly different responses.”

Joe Mullen

Joe Mullen
Director of Data Science and Professional Services
SciBite at Elsevier
“When undertaking complex tasks, traceability between input and output becomes critical—understanding why and how decisions were made during the process.”

This variability introduces an “interesting reproducibility issue” in both traditional experimental methods and AI-assisted research processes. To address this, precise engineering controls and guardrails are essential. “You need guardrails in place to ensure you’re applying it where it makes sense—for example, converting natural language to syntax for the information retrieval step of a RAG system, or at the end step where you say, ‘This is the context; provide an answer from the content I’ve returned.'”

Success depends on what Mullen describes as a three-part foundation: “quality data, the right technology, and an efficient means of interaction.” This foundation helps to minimize the stochasticity of generative AI, enhancing reproducibility in scientific experimentation using RAG and ensuring trustworthy AI systems.

And from a research perspective, the potential of RAG systems to scour academic literature can enable researchers to more clearly understand the scientific parameters of similar studies, enabling them to design more rigorous experiments.

The rise of the research agent

The evolution of AI research tools is moving beyond basic information retrieval toward more sophisticated, autonomous capabilities—a shift Mullen refers to as “agentification.” This transition has significant implications for scientific research methodologies and connects back to the earlier RAG implementation challenges of ensuring reliability and reproducibility, but gives the AI system more responsibility in ensuring the relevance and accuracy of the information retrieved.

94%
believe AI will rapidly increase the volume of scholarly and medical research in the next 2-5 years
Insights 2024: Attitudes toward AI

In clinical research applications, the potential of well-architected agentic RAG systems extends far beyond basic information retrieval. The complexity of drug discovery and clinical trial workflows demands sophisticated, multi-component orchestration.

“The information retrieval step could involve a combination of ontology-based and vector-based methods,” Mullen explains. “This means recognizing that questions can go to multiple sources, determining optimal routing pathways, ranking returned information, and executing complex process chains.”

Clinical trial optimization represents a prime use case for these advanced capabilities. “When it comes to recruitment, outcome prediction, and trial design optimization—that’s where you’re going to see a lot more application of AI in the future,” Mullen notes. These applications could benefit from sophisticated agentic RAG systems that go beyond simple query-response patterns, instead orchestrating multiple data streams and analytical processes in a coordinated, traceable manner.

Toward a symbiotic partnership

As AI architectures become more sophisticated, the focus will be on augmenting rather than replacing human researchers. Human researchers will have the ability to reimagine their roles. “It’s important that humans remain at the center of decision-making—the interpretation of data and validation of results,” Mullen emphasizes. “An agentic system doesn’t necessarily remove human involvement,” he explains. “We can think of it as a workflow where at every step, you could have human input: ‘This is the step I’ve reached, this is the data returned, and this is where the system is.'” This level of engagement varies by task complexity—from continuous oversight in research-intensive processes to lighter touch validation for operational tasks.

Looking ahead, success in AI-assisted research will depend on developing systems that prioritize accuracy, trustworthiness, and security. As the “Insights 2024” survey indicates, scientists expect generative AI tools to be based on high-quality, trusted sources. Organizations that achieve this balance while maintaining human centrality in scientific decision-making will distinguish themselves in the field. “When undertaking complex tasks, traceability between input and output becomes critical,” Mullen said. And it goes without saying that the hype surrounding AI can sometimes lead to applications that are solutions looking for problems to solve. “I think it’s really important to get your head around understanding what the problem is before you even start to think about what the solution is that you should be bringing to tackle that.”


Filed Under: clinical trials, Drug Discovery, machine learning and AI
Tagged With: agentification, AI hallucinations, AI hype, AI in education, AI in research, AI trustworthiness, Arxiv preprint, clinical trial optimization, clinical trials, computational frameworks, critical errors, Data Integration, data provenance, data validation, decision-making, drug discovery, Elsevier, explainability, generative AI, Google Bard, GPT-4, guardrails, hallucination rates, human-AI partnership, Insights 2024, Joe Mullen, Journal of Medical Internet Research, large language models, medical research, ontology-based methods, pharmaceutical industry, precision engineering, problem-solving., RAG systems, regulated industries, reproducibility, research agents, research variability, researchers' concerns, retrieval-augmented generation, scholarly research, SciBite, scientific research, scientific workflows, stochastic AI, traceability, trusted sources, vector database, vector-based methods
 

About The Author

Brian Buntz

As the pharma and biotech editor at WTWH Media, Brian has almost two decades of experience in B2B media, with a focus on healthcare and technology. While he has long maintained a keen interest in AI, more recently Brian has made making data analysis a central focus, and is exploring tools ranging from NLP and clustering to predictive analytics.

Throughout his 18-year tenure, Brian has covered an array of life science topics, including clinical trials, medical devices, and drug discovery and development. Prior to WTWH, he held the title of content director at Informa, where he focused on topics such as connected devices, cybersecurity, AI and Industry 4.0. A dedicated decade at UBM saw Brian providing in-depth coverage of the medical device sector. Engage with Brian on LinkedIn or drop him an email at bbuntz@wtwhmedia.com.

Related Articles Read More >

From data to drug candidates: Optimizing informatics for ML and GenAI
Intrepid Labs
Intrepid Labs raises $7 million to expand AI-driven formulation platform
AI agents could shoulder 55% of biopharma work, Accenture/Wharton study finds
Lokavant’s Spectrum turns clinical-trial planning into a live simulation
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE