Opting to make Google Cloud its primary cloud services provider, Ginkgo plans to develop new large language models for biological engineering applications based on Google’s Vertex AI platform. Debuting in 2021 as a framework for streamlining the machine learning lifecycle, Vertex AI has since evolved to incorporate more generative AI capabilities.
Further solidifying the partnership, Google Cloud will also help fund Ginkgo’s development of foundation models and fine-tuned applications. Additionally, Ginkgo will work with Google Cloud to mine insights from the extensive data its platform has accumulated over the years.
Assembling the data jigsaw puzzle
“We have a massive amount of data, but counterintuitively it is not high-density data like images, which are very storage intensive,” said Anna Marie Wagner, senior vice president, head of corporate development of Ginkgo in an interview at Google Cloud Next. The genetic sequence data it stores is rich in information but compact in size. “And if you delete it, that cell will die,” she said in an interview at Google Cloud Next.
The specific function of individual nucleotide sequences remains unclear, but untangling the mystery is more of a querying matter than a data storage problem, she said. “There are still many open research questions in this space,” Wagner said. “There will be joint learnings both in technology and in monetization.”
Ginkgo Bioworks has amassed 2 billion protein sequences, 5 million enzyme designs, 720,000 CAR-T data points, 1 million strains via EncapS technology, more than 100,000 AAV capsids for gene therapy, 100 million genome edits annually and a synthesis of over 100,000 genes per year. Data visualized logarithmically, drawn from Ginkgo’s website. |
A mutual partnership
Shweta Maniar, Google Cloud’s director of healthcare and life sciences solutions for biopharma/biotech, provided insights into the partnership. “One area we’ll be working on is behind the scenes with our teams on the ground,” she said, touching on the collaboration’s rationale. “It’s about bringing the life science ecosystem to Google Cloud, rather than working one by one.” She noted conversations across Google that can only happen through this partnership, with Google Cloud as the underlying data fabric across the organization. “To leverage innovations across Google, it all starts with Google Cloud as the foundation of how we work together. That’s key to these partnerships evolving,” Maniar emphasized.
Partnerships and the power of data
The swift advance of AI in recent years has caught the attention of industry leaders, who now have a burgeoning collection of AI tools to explore. “It’s not just about building a model; the algorithms and architectures are continuously evolving,” Wagner said. “What remains constant, in our view, is the value of data.”
Maniar underscores the importance of partnerships as AI technologies emerge, “We’re going to need partners like Ginkgo in order to make that next iteration.” Google’s mission isn’t just about amassing resources or technologies; it’s fundamentally about disseminating information. “Google’s overall mandate has always been about bringing more information to people,” Maniar said.
That can be complicated considering the pace of AI development. “We’re aware that the landscape changes every six months,” Maniar noted. And partnerships with companies like Ginkgo enable Google Cloud to keep track of emerging technological needs.
In a similar vein, “generative AI” has transitioned from an obscure concept in September 2022 to a leading technology topic by June 2023, according to Google Trends data.
On foundation models and their promise
A type of generative AI, foundation models, is at the heart of the generative AI trend. Highly flexible, foundation models are typically large-scale models trained on a vast corpus of unlabeled data to support an array of applications.
“In human language, GPT-4 is a foundation model,” Wagner said. “And then on top of that foundation model, you build apps trained to solve very specific tasks.”
Wagner described proteins as “low-hanging fruit” given Ginkgo’s extensive proprietary data.
The Ginkgo foundation model will draw from all available DNA. “I believe we’re unique in having both acquired and constructed more DNA than what’s available in public databases,” she shares.
Wagner pondered the future possibilities, asking when researchers might unravel genomic functions for targeted therapies and explore the roles of specific sequences, like promoter regions — a type of gene transcription initiator — within genomes. “That to me is the next layer and the next foundational model that we can train.”
Filed Under: Cell & gene therapy, Genomics/Proteomics, Industry 4.0