AI in drug discovery is a topic that gets an outsized amount of attention, observes Carl Hansen, CEO of AbCellera, a company specializing in antibody drug discovery. “It’s as if people are saying, ‘AI is here, it’s going to save us. Thank God, we’re finally gonna be able to create drugs,” he said. “To me, that’s implicitly saying, ‘We don’t know what the heck we’re doing.’”
But human researchers have been tirelessly working behind the scenes, making incremental advances that lay the foundation for these so-called AI “breakthroughs,” according to Hansen, who holds a Ph.D. in applied physics/biotechnology from Caltech. “I’m much longer on human intelligence than artificial intelligence,” he added.
Human ingenuity hides behind one of biotech’s most celebrated AI advances.
AlphaFold, the protein structure prediction software that launched from DeepMind in 2018, made headlines two years later when it won the CASP14 competition, achieving a level of precision in protein structure prediction similar to experimental techniques like X-ray crystallography. “There was a big step change in our ability to predict protein structure from sequence,” Hansen said. The protein structure prediction challenge had vexed researchers for more than 50 years.
But the media narrative surrounding the breakthrough was overly simplistic. “The interpretation of the layperson in the media is that now AI has reached the point where it can solve this impossibly hard problem.”
In reality, AlphaFold was only possible thanks to decades of work by thousands of human scientists across the globe who created a vast database of experimentally determined 3-D protein structures. There was a “massive, massive experimental push to get that data,” noted Hansen.
The results of this monumental research effort are housed in the Protein Data Bank (PDB), a repository that served as a primary source of training data for AlphaFold. The structures in the PDB are as accurate as they are vast, showcasing angstrom- or sub-angstrom-level accuracy. “The nature of that data is perfectly organized, because if a structure is real, there’s no wiggle room in where the molecule is,” Hansen said. While advances in computation and models have undeniably played a role, it’s the combination of volume and precision that made it a hand-in-glove fit for computational protein folding prediction. “If you have a perfectly organized giant dataset, then machine learning can do remarkable things. It’s the power of the dataset,” Hansen said.
Spreadsheet chaos poses roadblock for AI in antibody drug discovery
Antibody-based drug discovery is fundamentally “a search problem,” according to Hansen. “Data generation is intrinsic to what we are trying to do as a company,” he said. AbCellera collaborates with numerous partners and manages tens of drug discovery programs per year, resulting in gargantuan data complexity.
While AI holds potential in streamlining antibody drug discovery processes, many organizations simply lack the organized datasets required to yield meaningful insights from the technology.
The antibody drug discovery process is necessarily a team sport. “You begin with hundreds or thousands of antibodies. No one person can do that,” Hansen said.
Antibody drug discovery involves a continuous transfer of information and samples across specialized teams in various labs. Each step, from generating antibody diversity to characterizing their biophysical properties, adds a layer of complexity. “If you were to run around and follow the experiment with a spreadsheet, you would have to update information at every single step: What was done? What was the plate? Did that experiment work? Who touched it?” according to Hansen.
Many organizations end up using scores of spreadsheets to track such processes, Hansen said. “You could have 300 people emailing 2,000 different spreadsheets to each other — and they don’t match up.”
Raising the roof on technology and processes
Eleven years ago, when Carl Hansen co-founded AbCellera, he foresaw a looming challenge in antibody drug discovery. Despite the industry’s explosive growth, transforming from niche beginnings to a market worth hundreds of billions of dollars, there was a stagnation in productivity. A failure to continuously keep up with technology and processes are to blame. As Hansen observed, the initial boom was a result of a privileged few accessing the first-generation technologies, picking the “low hanging fruit” of easy targets and applications. But as these were exhausted, the industry plateaued. “There are only high-hanging or low-hanging fruit relative to the capabilities of your technology. If you don’t continue raising the roof on technology, you quickly get diminishing returns,” he said.
The following visualization illustrates some of the steps involved in monoclonal antibody development, based on information from “What are therapeutic monoclonal antibodies?” from Thermo Fisher Scientific, “Monoclonal Antibody Development” from Sino Biological and the article “Production Processes for Monoclonal Antibodies” on IntechOpen.
Hansen emphasizes the significance of high-quality, organized datasets for successful AI applications. “We are investing in high-throughput experimental technologies to generate immune diversity,” he said. This means making antibody responses, search through millions of cells, finding many thousands of antibodies, sequencing them, and characterizing their properties. “We run many programs annually, each running programs through a whole series of experimental systems, each of which is producing insanely complex and often high-throughput data sets,” he said. Tackling the complexity requires systematic tracking of experiments and results. “In essence, our business revolves around generating data,” Hansen concluded.
Why having AI alone is not enough in biotech
While generative AI engines such as ChatGPT may have fueled more public interest in the capabilities of artificial intelligence, Hansen said the approach to training the large language model contrasts sharply with biotech requirements. “ChatGPT has basically read every single word on the internet,” he said. And consequently, the large language model can generate textual responses based on its training and iteratively refine its approach based on new data or feedback, going through waves of iterations to optimize its accuracy.
To do something similar in biotech would require a huge amount of data to start. “It would have to be harmonized, which basically means you have to do it internally. It can’t just be vacuumed up from the world,” Hansen said.
The notion that AI alone can revolutionize biotech is misguided, according to Hansen: “If you rank order all of the hard things, the AI model part is the easiest part.” The challenges relate more to data generation. “It takes years and years to build that capability,” he said.
Sifting through the hype of AI-based antibody drug discovery
Pointing to claims that generative AI models are spitting out de novo sequences of antibodies that recognize the target, without any experimental technique, Hansen expressed skepticism. “For one, I think that’s beyond what the technology can do today,” he said. “Two, I think the problem is ill-posed.” Pointing to the complex nature of antibody drug development, Hansen noted that when presented with a drug target, the ideal binding region for an antibody isn’t necessarily clear immediately.
Traditionally, researchers would immunize animals to produce antibodies that bind to multiple potential regions of a target molecule. They then experimentally test the antibodies to identify the most effective. “That’s one of the most important steps in developing a drug,” Hansen said. “But the way these AI companies frame it is, ‘Here’s the molecule, and I will make an antibody that binds on this region.'” The result is a dismissal of an array of potential binding options, potentially overlooking more innovative drugs. Emphasizing the stakes in drug development, Hansen underscored the importance of innovation. “In drug development, you don’t get a ribbon for second place. It’s all about innovation and novelty,” Hansen said.
In antibody drug discovery, genuine scientific advancements matter more than advances in computational linguistics. “ChatGPT isn’t about finding drugs. It’s about stringing together words to make convincing-sounding arguments,” he said. But coherent-sounding arguments aren’t necessarily true. The crux of groundbreaking antibody drug discovery rests on rigorous scientific principles, not just in artfully crafted narrative. “You could maybe use ChatGPT to draft press releases if you wanted,” Hansen said.
Filed Under: Data science, Drug Discovery, Genomics/Proteomics, Industry 4.0, machine learning and AI