Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Views
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

How stereo-correct data can de-risk AI-driven drug discovery

By Brian Buntz | October 15, 2025

Image: Wikimedia Commons (Public Domain)

Image: Wikimedia Commons (Public Domain)

In drug discovery, the difference between a breakthrough and a breakdown can be as small as the twist of a bond. Stereochemistry, the three-dimensional arrangement of atoms, governs how otherwise identical molecules behave in the body: whether they bind the intended target, trigger off-targets, or get cleared in minutes rather than hours.

Adam Sanford

Adam Sanford, Ph.D.

“Shape really does matter,” said Adam Sanford, Ph.D., senior product lead at scientific information firm CAS, a division of the American Chemical Society. “It could be an identical structure, but have a different handedness, so to speak. It might make the difference between it working at all and being extraordinarily effective.”

Yet records containing stereo information can be fragile or inconsistent with the data disappearing in scanned PDFs, hand-offs between lab notebooks and databases introduce transcription errors, and OCR software routinely mangles the symbols that encode three-dimensional orientation. In addition, file-format conversions between systems can lose configurational details. No matter the cause of potential errors or data loss, the results are not good. “If you lose that stereo information, or it gets jumbled… a lot of the data… is not really actionable… [and] may lead you down a completely incorrect path,” Sanford said.

The consequences of systematic errors as well as random errors can ripple through computational workflows: errors in stereochemical representation can propagate into QSAR models, pharmacophore models, and docking experiments. As a result, such errors can lead to misleading virtual screening results that hinder chemical design and drug discovery efforts.

The stakes are high for machine learning applications in drug discovery, where practitioners acknowledge that the predictive power of any ML approach depends on data quality. As a rule of thumb, the practice consisting of at least 80% data processing and cleaning and only 20% algorithm application. Industry surveys have found data preparation often dominates data-science effort, and reviews in drug discovery emphasize that high-quality curated data are prerequisite for reliable ML. When training data lack accurate stereochemical information, the resulting noise and errors can compromise the reliability of AI models, affecting everything from virtual screening to property prediction. “In cases where we… train those algorithms on less sophisticated data, you see a direct impact in the overall efficacy of the… results,” Sanford said.

The stakes

The consequences of stereochemical oversights aren’t limited to prediction errors or wasted R&D dollars. The field learned that lesson with thalidomide, marketed as a sedative between 1957 and 1961, which caused severe birth defects in over 10,000 children. One enantiomer eased morning sickness; the other was teratogenic.

Orr Ravitz, Ph.D.

Orr Ravitz, Ph.D.

As Orr Ravitz, Ph.D., senior product manager on the CAS life sciences team, noted, “Administering the thalidomide in only… the unharmful form… doesn’t help, because in vivo, the two forms are inter-convertible… there’s no way to give this drug safely to pregnant women.”

This catastrophic failure forced the industry and regulators to confront the importance of stereochemistry. By the early 1990s, the FDA issued its Policy Statement for the Development of New Stereoisomeric Drugs, requiring that “the stereoisomeric composition of a drug with a chiral center should be known” and that sponsors demonstrate identity, strength, quality, and purity “from a stereochemical viewpoint.” As Ravitz put it, “If the drug candidate has a stereocenter, the FDA requires investigation of different stereoisomers.” The result: by the 2010s, the majority of new FDA approvals were single enantiomers rather than racemates, and companies developed robust analytical methods to characterize stereoisomers. Chiral drugs remain common in recent approvals: 20 of the 35 novel FDA approvals in 2020 were chiral, reinforcing why stereo-aware data matters operationally.

The new vulnerability

But the rise of computational and AI-driven discovery has created a new vulnerability across the industry. In many workflows, machine learning models now ingest thousands of structures automatically without human review, propagating any stereochemical inconsistencies directly into predictions. Organizations that maintain human validation and curation can catch these errors before they corrupt downstream analyses. As one study notes, stereochemistry affects drug-receptor binding, metabolism, and toxicity.

“All the time. Happens all the time,” Sanford said when asked if stereo errors occur in practice. “There are all these points where errors can be introduced.” They span from electronic lab notebooks to PDFs, PDFs to databases, databases to training sets. Missed wedges and dashes in figures, OCR damage, file-format conversions, manual transpositions. Each transition introduces opportunities for stereochemical information to degrade or disappear entirely.

Recent research found that multiple deep-learning docking methods produced poses with physically implausible ligand configurations, including wrong stereochemistry, even when other accuracy metrics looked acceptable. The models appeared to work, until you examined the actual three-dimensional geometry they predicted.

On the flip side, when data is clean, new capabilities emerge. “Once we got data that was very comprehensive and reliable in terms of stereochemistry, suddenly the ability to derive stereospecific reaction rules became realistic,” Ravitz said.

The path forward

Given the risks involved, Sanford argues teams should treat chirality as an operational problem: establish standards, define specifications, implement tests. Feed design, synthesis and modeling with stereo-correct, identifier-harmonized data from the moment hits emerge.

“One of the things that’s really important for the early-stage drug discovery scientist is, what’s going to happen when I introduce that therapeutic into the body?” Sanford said. “In many cases, the active part of the therapeutic isn’t what you introduce; it’s what your body does to it.” Stereochemistry governs all of it: how compounds are metabolized, how long they persist, which pathways they activate.

As one review notes, “the availability of high-quality, accurate and curated data in large quantities” remains a fundamental challenge in applying machine learning to drug discovery. For AI-driven discovery to deliver on its promise, that challenge starts with getting the stereochemistry right.


Filed Under: Drug Discovery, Drug Discovery and Development, machine learning and AI
Tagged With: AI drug design, chirality, computational chemistry, data quality, drug discovery, enantiomers, FDA approval, machine learning, molecular modeling, QSAR, stereochemistry, thalidomide
 

About The Author

Brian Buntz

As the pharma and biotech editor at WTWH Media, Brian has almost two decades of experience in B2B media, with a focus on healthcare and technology. While he has long maintained a keen interest in AI, more recently Brian has made making data analysis a central focus, and is exploring tools ranging from NLP and clustering to predictive analytics.

Throughout his 18-year tenure, Brian has covered an array of life science topics, including clinical trials, medical devices, and drug discovery and development. Prior to WTWH, he held the title of content director at Informa, where he focused on topics such as connected devices, cybersecurity, AI and Industry 4.0. A dedicated decade at UBM saw Brian providing in-depth coverage of the medical device sector. Engage with Brian on LinkedIn or drop him an email at [email protected].

Related Articles Read More >

Sai Life Sciences to double process R&D capacity with new Hyderabad facility
Pfizer stock jumps 14% as TrumpRx drug pricing deal eases tariff fears
See it in 1 second, act in 60: Eschbach’s take on how visual factories enable measurable decisions
AUTOMA+ 2025 to cover pharma automation and digitalization trends
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Views
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE