Drug Discovery and Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE

Contrastive learning-based model ConPLEx elevates drug-protein interaction predictions

By Brian Buntz | June 12, 2023

Innovative drug discovery process through generative AI in a high-tech lab environment, illustrating the potential of advanced technology in medicine.

[Generative AI image from Tahsin/Adobe Stock]

Drug discovery, traditionally a labor-intensive process, often involves extensive computational work during experimental screening. Advances in AI, however, promise to streamline this process. To that end, a team from MIT and Tufts has introduced ConPLex, a computational model that uses large language model techniques, similar to those behind ChatGPT. The model analyzes vast amounts of text data to discern patterns and relationships among amino acids. The technique matches potential drug molecules to their target proteins without requiring complex molecular structure computation. The system’s efficiency allows it to sift through an array of more than 100 million compounds in a single day.

Bonnie Berger, head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the senior authors of the new study, explained over email how the ConPLex model could be adapted for a wider range of interaction predictions. Berger, joined by her research team, noted, “The current form of ConPLex relies on co-representing the protein and small molecule in a shared high-dimensional (embedding) space, with our model learning a representation scheme that places interacting proteins and drugs close together in this embedding space.”

In the context of machine learning, embeddings are vector representations of a particular data type. They map objects, such as words, proteins or drugs, into vectors of real numbers. Converting such objects into embeddings enables models such as ConPLex to manipulate these objects mathematically.

From patterns to predictions with ConPLex

A schematic of ConPLex

A schematic of ConPLex published on GitHub

The researchers designed the system to locate interacting proteins and drugs near one another in this shared, high-dimensional space. With the accurate integration of additional molecules into this space — consider an antibody modeled by AbMAP or a peptide represented by a protein language model — ConPLex could potentially be repurposed for a broader range of interaction predictions.

ConPLex overcomes limitations of previous computational models, which often struggled to distinguish between actual drug compounds and decoys — compounds resembling the drug but not interacting effectively with the target.

In addition, traditional models were error prone, predicting an interaction where there shouldn’t be one. To address that problem, ConPLex incorporates contrastive learning to differentiate between genuine drugs and imposters.

Contrastive learning is a type of machine learning technique that trains models to separate similar data points from dissimilar ones. The technique enables ConPlex to improve accuracy and efficiency in predicting protein-drug interactions.

Protein interaction prediction capacity

The ConPLex model draws from a database of more than 20,000 proteins, converting their amino-acid sequences into meaningful numerical representations. This encoding captures the correlation between sequence and structure, improving prediction  accuracy.

Another unique aspect of ConPLex is its ability to account for the dynamic nature of proteins and drug molecules, a vital feature for accurately predicting interactions. Berger’s team elaborated, “Rather than explicitly representing the protein 3D structure at an atomic resolution, ConPLex uses an implicit representation of proteins in a high dimensional space using the protein language model (and likewise for the small molecule).”

This representation, they believe, captures not only the standard protein structure but also its conformational flexibility. “Our machine learning model can effectively marginalize over multiple molecule conformations with this implicit representation, taking into account their flexibility and dynamics.”

“By working with this implicit representation, our machine learning model can thus effectively marginalize over several conformations of the molecule, accounting for conformational flexibility and dynamics,” the team continued. “However, we think there is exciting future work to be done explicitly representing conformational flexibility which could further improve model performance!”

In tests, ConPLex ran on multiple CPUs and one GPU

Berger’s team also shed light on the screening capabilities of ConPLex and its hardware requirements. “We ran our predictions on an Intel server with multiple CPUs, but using only a single NVIDIA A100 GPU.”
NVIDIA designed the data-center grade A100 Tensor Core GPU for deep learning workloads. Based on the Ampere GA100 GPU, the A100 is part of the NVIDIA data center platform and accelerates over 700 HPC applications and every major deep learning framework. Such hardware is common in academic computer science labs and is also available in cloud-computing platforms from vendors like AWS or Azure.

The MIT and Tufts researchers further tested ConPLex by screening a library of about 4,700 candidate drug molecules for their binding ability to a set of 51 protein kinases, a type of enzyme. From the top hits, they selected 19 drug-protein pairs for experimental tests. A total of 12 pairs had strong binding affinity. Four of these pairs exhibited extremely high affinity, suggesting that the drug concentration needed to inhibit the protein would be in the sub-nanomolar range. Such results underscore ConPLex’s potential for large-scale screenings and the identification of strong drug candidates.

Ensuring accessibility for the scientific community

In terms of selecting test cases for experimental validation, the team conducted an unbiased all-vs-all scan of 51 kinases against 4,715 small molecule drugs. From this set, they selected top predicted kinase/drug pairs based on specific criteria. “We focused on kinases that were predicted to interact with several drugs, selecting five such kinases,” the team noted. “Note that we did this selection without peeking at the labels of the kinases or drugs, and so were completely agnostic to their biological function or prior known interactions.”

To date, interest in the ConPLex mode has been considerable. “We have been invited by some industry folks to speak about our algorithm,” the team noted. “We have also embarked on a promising collaboration with Dr. Eytan Ruppin of the NIH on exploring drug targets for cancer.”

The researchers have made ConPLex freely accessible to the scientific community. Researches can install the software in terminal via the command “pip install conplex-dti.” It is also available on GitHub. ConPLex v0.1.0 is now in its pre-release stage and remains in active development.


Filed Under: Data science, Drug Discovery and Development, machine learning and AI
Tagged With: AI in Medicine, computational biology, Drug Screening, Large-Scale Drug Discovery, machine learning, MIT and Tufts Research, Protein Modeling
 

About The Author

Brian Buntz

As the pharma and biotech editor at WTWH Media, Brian has almost two decades of experience in B2B media, with a focus on healthcare and technology. While he has long maintained a keen interest in AI, more recently Brian has made making data analysis a central focus, and is exploring tools ranging from NLP and clustering to predictive analytics.

Throughout his 18-year tenure, Brian has covered an array of life science topics, including clinical trials, medical devices, and drug discovery and development. Prior to WTWH, he held the title of content director at Informa, where he focused on topics such as connected devices, cybersecurity, AI and Industry 4.0. A dedicated decade at UBM saw Brian providing in-depth coverage of the medical device sector. Engage with Brian on LinkedIn or drop him an email at bbuntz@wtwhmedia.com.

Related Articles Read More >

Collage of close-up male and female eyes isolated on colored neon backgorund. Multicolored stripes. Concept of equality, unification of all nations, ages and interests. Diversity and human rights
How a ‘rising tide’ of inclusivity is transforming clinical trials
Mary Marcus appointed CEO of NewAge Industries
DNA double helix transforming into bar graphs, blue and gold, crisp focus on each strand, scientific finance theme --ar 5:4 --personalize 3kebfev --v 6.1 Job ID: f40101e1-2e2f-4f40-8d57-2144add82b53
Biotech in 2025: Precision medicine, smarter investments, and more emphasis on RWD in clinical trials
Data analytics tools help doctors analyze trends in patient outcomes and population health.
External comparator studies: What researchers need to know to minimize bias
“ddd
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest news and trends happening now in the drug discovery and development industry.

MEDTECH 100 INDEX

Medtech 100 logo
Market Summary > Current Price
The MedTech 100 is a financial index calculated using the BIG100 companies covered in Medical Design and Outsourcing.
Drug Discovery and Development
  • MassDevice
  • DeviceTalks
  • Medtech100 Index
  • Medical Design Sourcing
  • Medical Design & Outsourcing
  • Medical Tubing + Extrusion
  • Subscribe to our E-Newsletter
  • Contact Us
  • About Us
  • R&D World
  • Drug Delivery Business News
  • Pharmaceutical Processing World

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Drug Discovery & Development

  • Home Drug Discovery and Development
  • Drug Discovery
  • Women in Pharma and Biotech
  • Oncology
  • Neurological Disease
  • Infectious Disease
  • Resources
    • Video features
    • Podcast
    • Voices
    • Webinars
  • Pharma 50
    • 2025 Pharma 50
    • 2024 Pharma 50
    • 2023 Pharma 50
    • 2022 Pharma 50
    • 2021 Pharma 50
  • Advertise
  • SUBSCRIBE