A recent collaboration between the San Diego–headquartered clinico-genomics company Helix and the Salt Lake City–based biotech Recursion Pharmaceuticals aims to go even further. “We could surpass [UK Biobank] probably by the end of 2025,” says Hylton Kalvaria of Helix.
The partnership will combine Helix’s growing clinico-genomic dataset with more than 25 petabytes of Recursion’s proprietary biological and chemical data.
The world is awash in data. To cite one example, the latest CommonCrawl dataset, encompassing 17 years of website data, totals 377 tebibytes. That’s equivalent to the storage capacity of more than 3 million 128-gigabyte USB flash drives.
“There are larger data sources out there,” Kalvaria acknowledged. “But [the combined Helix-Recursion dataset] is still a heck of a lot of data and, importantly, it adds to itself over time.”
The “sequence once, query often” model: Unlocking a lifetime of genetic insights
Helix isn’t focused on building a massive genomic database alone. The company follows a “sequence once, query often” approach to genomic data management. “Your genetics are the one thing about your health that never changes over time,” Kalvaria said. Having that information available upfront can offer a range of advantages whether in routine care or disease diagnostics. “There are many reasons why you might come back to that [genomic] information over time,” Kalvaria added.For patients, a single DNA sequencing can provide insights for years to come, eliminating the need for further repeated genetic tests. For physicians, access to a patient’s genomic information can support quicker, more informed decisions. Kalvaria notes, “If a patient has already been sequenced, results can be obtained in minutes rather than weeks.” In the longer run, doctors could potentially confirm or rule out genetic conditions during an office visit using such data in the context of patient care. As scientists learn more about how genes relate to diseases, such readily available genetic data could help identify patients who might benefit from specific treatments or preventive measures.
The advantages also extend to healthcare systems: For instance, reducing duplicate testing can lead to cost savings and more efficient use of resources. The potential impact of this method goes beyond just saving time and money.
BioHive-2: The supercomputer informing drug hunting
While the potential of mining genomic data to improve patient care and more targeted drug discovery are not new, decoding such data is not necessarily straightforward. To help unearth the hidden connections between genes, diseases, and potential treatments, Helix is turning to Recursion Pharmaceuticals and its secret weapon: BioHive-2.
This isn’t your run-of-the-mill supercomputer. Powered by 504 of NVIDIA’s latest H100 Tensor Core GPUs, BioHive-2 is purpose-built for AI-driven drug discovery. Recursion estimates that the computer, four times faster than its predecessor, is currently the fastest, wholly-owned supercomputer in the pharmaceutical industry globally.
Over the past decade, Recursion has also amassed one of the world’s largest collections of biological and chemical data. Now that its supercomputer, BioHive-2, is operational, it plans to use the horsepower to create more advanced AI models that can help streamline the drug discovery process.
“We’re really excited to work with [Recursion] and make those new discoveries that will bring medicines to patients faster,” Kalvaria said.
Potential drug development lifecycle advantages extend from lab to market
While it is difficult to quantify how the partnership will impact drug development workflows, the efficiency gains could be significant. “Ultimately, that’s how we see our role in this drug development lifecycle,” Kalvaria noted. “Help people to screen through more things, to interrogate more drug targets upfront.”
The technological partnership could also inform the commercialization of drugs once they hit the market. “We have this footprint with health systems where we can actually serve patients who could be good candidates for a new drug that comes onto the market,” Kalvaria said.
Filed Under: Drug Delivery, Genomics/Proteomics, machine learning and AI, Omics/sequencing