After ChatGPT debuted in late 2022 and GPT-4 hit the scene in March of the following year, many executives at pharma companies sat up and took note, interested in new tech that could accelerate the stubbornly slow drug development process. Translating that vision into reality, however, is not straightforward and a significant number of pharma companies have moved to prohibit the use of ChatGPT but the majority are still exploring GenAI with Boston Consulting Group projecting that the GenAI market in healthcare will grow at a compound annual rate of 85%.
While bespoke generative AI systems can help accelerate an array of drug discovery and development tasks, the capabilities of off-the-shelf large language models in the domain are, well, limited. “If you ask ChatGPT to design a drug, it will sort of think hard and try to maybe get something, but it doesn’t really have access to specialized software that could design drugs,” said Garegin Papoian, Ph.D., co-founder and CSO of Deep Origin. “So most likely, it will produce something that’s quite disappointing to a professional.” Papoian is also a professor at the University of Maryland, with joint appointment in chemistry and Institute for Physical Science and Technology.
Balto, can you suggest modifications to this lead compound that improve solubility without impacting binding affinity?
Recognizing the need for a more robust approach, Papoian and his team at Deep Origin developed Balto, an AI assistant specifically designed for drug discovery. Need to know the logP of a molecule? Curious how to improve solubility without sacrificing binding affinity? Balto can help. This AI-powered assistant tackles a range of queries, from basic property predictions like “What’s the logP of this molecule?” to complex design challenges like “Suggest modifications to improve solubility while maintaining binding affinity to this protein.” Or a user might ask Balto to sift through scores of molecules to identify potential drug candidates for a specific protein target.
Deep Origin notes that Balto outperforms other models on docking accuracy, as benchmarked on the PDBbind core dataset
“Every time you ask [Balto a question about a drug candidate], it’s calling an API or some program that is highly specialized, the state of the art in that area, and making a query, getting back an answer, and telling you that answer,” Papoian explained.
Balto blends Deep Origin’s proprietary tools, including BiosimDock for binding pose and affinity prediction, BiosimVS for ultra-large-scale virtual screening, and BiosimProps for ADMET property prediction, with open-source software for docking, pocket location, and other analyses. A blog post highlights BiosimDock’s performance in predicting ligand poses and ranking affinities compared to popular docking tools like AutoDock Vina and Glide. BiosimVS showcases impressive hit enrichment, identifying true binders from a pool of 100,000 molecules for challenging targets like KRAS G12D and DPP4. Meanwhile, BiosimProps outperforms well-known models in predicting key properties like solubility (logS), partition coefficients (logP), and distribution coefficients (logD).
Altogether, Balto weaves together dozens of specialized tools that are accessible via a chat interface. “We’re not aware of any other program right now that has that scope or breadth of capabilities that Balto has,” Papoian said.
Democratizing drug discovery with AI
While there are currently no FDA-approved drugs that were discovered solely via machine learning (ML), it’s only a matter of time before that changes. The technology, though it has decades of historical use in the field, is growing in adoption and power.
Ultimately, ML-based tools could also level the playing field, empowering smaller companies and academic labs with the right skills, equipment and data access to make more significant contributions to drug development. As tools like Balto become both more intelligent and more powerful, smaller research teams could have access to comparable resources as larger teams, Papoian said, “as long as they know what biology they want to work on.”
This shift towards AI-driven drug discovery, however, requires more than just sophisticated algorithms. It demands a new breed of scientists — those comfortable straddling the worlds of biology, chemistry, and computer science. Papoian sees this firsthand at Deep Origin: “We have people coming with computer science backgrounds that have never had probably university chemistry or biology classes,” he notes. “And then in a couple of years, they become deep experts in…protein structure modeling or small molecule modeling and AI applied to cheminformatics.”
A new breed of researchers
This bridging of the gap between tech and research science will lead to emergence of more researchers who can not only ask the right biological questions but also wield the computational tools needed to uncover the answers. The groundwork for this shift has been in place for decades, Papoian says. “Definitely in U.S. universities,” Papoian observes, “there has been focus in the last 10 to 20 years on interdisciplinary programs where they teach you both computer science but also bioinformatics or biology or some other related background.”
The trend has gained momentum in the past years. “As younger people graduate from universities, especially more recently, there’s that expectation that even if they are biologists, especially bioinformaticians and so on, that they’d have experience in coding and other computer science skills,” said Papoian.
Papoian highlights that Deep Origin has witnessed several computer scientists successfully develop deep expertise in areas like protein modeling and cheminformatics, although noting it is rarer than the other way around. “I think the other direction, computer scientists entering biology and chemistry… in Deep Origin and in BioSim AI, we definitely had that, have seen that conversion very successfully, and almost at scale,” he shares, adding “So it’s definitely doable, and I’d like to see more of that happening in the industry.”
The need for proofs and transparency
Despite the potential benefits, there remains a degree of skepticism towards AI in the pharmaceutical industry. Papoian emphasizes the need for “proofs” to convince veteran researchers of the technology’s value. “I think many of these folks probably have valid concerns and they want to see proofs,” he stated. “And that hasn’t been the case, by the way, that there have been overwhelming cases of AI producing amazing drugs that go all the way through clinical trials and get FDA approved.” Overcoming this reluctance will require not only demonstrable results, but also a shift in mindset and a willingness to embrace new approaches.
Papoian’s concerns about the lack of transparency in AI tools are particularly acute when considering the release of AlphaFold 3. While the tool itself represents a significant advance in protein folding prediction, Papoian argues that its limited accessibility, particularly compared to its predecessor, AlphaFold 2, presents an obstacle to scientific progress. “[With AlphaFold 3] it’s almost like if OpenAI released ChatGPT but doesn’t really let you talk to it… It’s concerning that major publications like Nature now allow publications like that,” he stated.
Disruptive, in a good way
The demands for scientific rigor are well-founded, and a healthy amount of skepticism is well-warranted. It makes sense that drug developers “want to see proofs” about how AI can help. In the coming years though, Papoian expects to see early adopters wielding a competitive advantage over those who don’t. “That could convince other folks to use [AI tools],” he said. “I think in the next few years, it will be quite disruptive, and in a good way, I think, because it will democratize access to these tools that are very sophisticated, but at the same time, I think the guardrails of transparency, of actually testing what the system can do, are important for us not to get carried away.”
Filed Under: Data science, Drug Discovery and Development, machine learning and AI