Athos Therapeutics is a clinical-stage biotech focused primarily on developing precision small molecule therapeutics for autoimmune diseases and chronic inflammatory diseases. Like a growing number of biotechs, Athos is giving machine learning a prominent role in its drug discovery approach. Its advanced AI/ML platform integrates patient samples and data from global hospital systems to identify novel drug targets. The system incorporates a spectrum of data types, including transcriptomics, proteomics, genomics, and other omics data.
Dealing with massive datasets
The company sources multi-omics data from patient samples sourced through partnerships with premier global hospitals, including the Cleveland Clinic, Lahey Hospital & Medical Center and the University of Ioannina Medical School. Its multi-omics analysis draws on data from more than 25,000 human patients. It also has partnered with Caltech to perform spectrometry-based proteomics using cultured cells and biopsies from archived and de-identified patient samples.
The datasets involved can be substantial, with some files exceeding five gigabytes. “Quality control is the first crucial step,” says June Guo, vice president, artificial intelligence and machine learning of the company. “Traditionally in computational biology, pipelines perform pre-processing, processing, and post-processing all at once, which can lead to wasted time if the data is contaminated. It can take days or even months to process all these datasets when we talk about thousands of them.”
For more on Guo’s background and Athos’s unique approach to AI in healthcare, read “From self-driving cars to an autonomous AI/ML analytical platform for drug discovery.]
Dealing with continual data growth
These lengthy processing times, coupled with the ever-increasing volume of data, meant Athos’ cloud computing costs were becoming unsustainable. “Our data continues to grow every year, and we couldn’t handle it locally,” Guo said. Sending all of the data to the largest cloud vendors was prohibitively expensive.
The search for a new strategy led Athos to an unexpected partner: Vultr, which bills itself as the largest privately-held cloud computing platform. The company learned of Vultr through discussions with Dell about its high-performance computing infrastructure. It was during these conversations that the name “Vultr” first surfaced. “We didn’t even know who Vultr really was,” Guo said.
The company will use the Vultr Cloud GPU for its AI model training, tuning, and inference. The arrangement will give Athos access to the NVIDIA HGX H100 GPUs running on Dell Technologies’ PowerEdge XE9680 servers.
NVIDIA HGX H100 GPUs offer 80GB of RAM per GPU
While there are other options for GPUs, NVIDIA hardware is not only powerful but easy to work with making it possible to train, deploy and accelerate inference. NVIDIA launched CUDA (Compute Unified Device Architecture) in 2006, which streamlined the process of tapping GPUs for general-purpose computing tasks, including scientific computing, simulations and AI, beyond their traditional use in graphics rendering.
The NVIDIA HGX H100 GPUs, based on the company’s Hopper architecture, are built for AI and HPC applications. “The hardware side is really powerful,” Guo said. “Per GPU, it has 80 gigs of RAM. In order to have models, whether generated from auto or the one we’re currently training on patient subtyping, you need large RAM for holding all the data.” The hardware enables the company to fully load its data and have a platform to fully train them, then test on the fly what the patient subtypes are. “Also, the network bandwidth allows us to transmit data from the CPU to the GPU,” he said. “When the data is [on the GPU], you don’t have to move it around.” In turn, this accelerates the whole computing process. “You’re not only optimizing the prediction, but also the training part. You’re optimizing the whole system — data comes in, and when it’s done, goes out.”
Filed Under: Genomics/Proteomics, machine learning and AI