Following the recent expansion of Amazon Omics, as covered in our previous article “Ready, set, analyze: Amazon Omics unveils new Ready2Run workflows”, we wanted to explore the role of third-party providers in shaping the landscape of genomic analysis and bioinformatics. NVIDIA is one of these providers. The GPU company has worked to integrate their Parabricks Ready2Run workflows into Amazon Omics, adding new capabilities to the platform.
To get a better sense of NVIDIA’s role and perspective in this domain, we reached out to Jason Fenwick, genomics business development at the company. In the following interview, Fenwick touches on topics ranging from the unique aspects of NVIDIA’s Parabricks Ready2Run workflows to the acceleration of genome analysis over CPU-based tools. He also explores the company’s collaboration with the GATK team at the Broad Institute, and muses on the company’s NVIDIA AI Enterprise offering.
What distinguishes the NVIDIA Parabricks Ready2Run workflows from other genomic analysis applications?
Fenwick: NVIDIA Parabricks offers pre-configured and tested “Ready2Run” workflows in Amazon Omics for germline and somatic whole genome analysis, as well as a re-alignment workflow for updating the reference genome.
Could you elaborate on how Parabricks accelerates genome analysis over CPU-based tools?
Fenwick: NVIDIA Parabricks provides GPU-accelerated versions of industry-standard tools that are used by computational biologists and bioinformaticians—enabling significantly faster runtimes, workflow scalability, and lower compute costs. Parabricks also includes an accelerated version of the latest DeepVariant 1.5 for deep learning based accurate variant calling.
What are the main benefits of the 13 Parabricks germline and somatic workflows that are now offered in Amazon Omics as Ready2Run workflows?
Fenwick: The Parabricks Ready2Run workflows support alignment, germline variant calling, somatic variant calling and re-alignment to new reference genomes. Runtimes and costs for each workflow are transparent and predictable. Additionally, every workflow is pre-configured and tested, so no additional setup is needed to get started.
Can you describe the validation process undertaken by NVIDIA and the GATK team at the Broad Institute to ensure accuracy in the Parabricks GATK workflows?
Fenwick: The NVIDIA team collaborated with the GATK team at the Broad Institute to evaluate the accuracy of the germline workflows. Through this rigorous process, they verified that the Parabricks workflows produce results that are functionally equivalent to the CPU-native GATK versions, as originally defined here.
As a specific example, the GATK team compared the results of the Parabricks germline workflow with the equivalent commands in GATK’s Whole Genome Germline Single Sample workflow, and found that the results were more than 99.99% equivalent in both precision and recall.
Can you provide a brief overview of the types of benefits organizations could receive with NVIDIA AI Enterprise?
Fenwick: For enterprise and other customers who need support for Parabricks in clinical workflows, large sequencing projects, high-throughput platforms, or other workflows that require immediate assistance, NVIDIA offers NVIDIA AI Enterprise. Organizations receive full access to NVIDIA Enterprise Support, which provides guaranteed response times, priority security notifications, and access to Parabricks experts to troubleshoot and optimize genomics workflows. NVIDIA AI Enterprise is designed to accelerate and streamline development and deployment of Parabricks.
NVIDIA AI Enterprise offers support for various frameworks, including MONAI for medical imaging and RAPIDS for GPU-accelerated data science libraries. It aims to streamline the development and deployment of production AI, covering generative AI, computer vision and speech AI. With numerous frameworks, pre-trained models and development tools, NVIDIA AI Enterprise is focused on advancing AI integration for enterprises.
Filed Under: Data science, Drug Discovery, Drug Discovery and Development, Industry 4.0, machine learning and AI, Omics/sequencing