In training, BigRNA learned from thousands of genome-matched datasets, enabling it to predict tissue-specific RNA expression, splicing patterns, microRNA sites, and RNA binding protein specificity — all from raw DNA sequences.
Promising early results
In early tests, BigRNA showed strong capabilities, correctly predicting the effects of steric blocking oligonucleotides (SBOs) on increasing the expression of all four genes tested. “We trained BigRNA using RNA-Seq datasets and genomic sequences,” said Brendan Frey, Ph.D., founder and chief innovation officer of Deep Genomics. “BigRNA had never trained on oligonucleotide therapeutics.” But in tests, the model accurately forecasted the impact of steric blocking oligonucleotides (SBOs) on boosting the expression of all 4 genes tested while correctly predicting splicing outcomes for 18 out of 18 exons across 14 genes, including those linked to Wilson disease and spinal muscular atrophy,” as the bioRxiv paper noted.
BigRNA’s unexpected proficiency in designing oligonucleotide therapeutics highlights a notable advantage of foundation models: their ability to develop capabilities beyond their initial training. “The big breakthrough has been foundation models,” Frey said. “We can now build one model that can do it all, and even do things we hadn’t imagined.”
“It’s just one model with one data set. And so it’s scalable,” Frey said. “You can increase the size of the neural network and get a better model.”
Previously, each of Deep Genomics’ models was trained separately, and thus did not account for interdependencies. “The most obvious thing is there’s just no way to scale up 14 different machine learning models,” Frey said. “Just imagine that each of those models had its own carefully constructed data set you know, the researchers had to spend a whole bunch of time optimizing the training parameters, then doing validation and testing 40 different times.”
The company has also developed a platform known as the “AI Workbench,” which can analyze genomic data to find therapeutic targets and design RNA-based drug candidates for genetically-defined diseases.
The expanding universe of BigRNA
The size and scale of BigRNA are constantly evolving. “The model released a year ago had about a billion parameters, similar to GPT-2,” Frey said. “Our current internal version has about 100 billion parameters. We expect BigRNA to have upwards of a trillion parameters by the end of the year.”
This exponential growth in size is driven by the gigantic amount of data required to train the model. “In the early days of machine learning, we used to talk about thousands of data points,” Frey said. “Around 2005, we were excited about a million data points. Now, we’re not even talking about billions, but trillions of data points.”
Frey draws a comparison to OpenAI’s GPT-4, a trailblazing large language model rumored to have well over a trillion parameters. BigRNA is on track to reach a similar scale. “Biology is more complex than human discourse for several reasons: We don’t innately know the ‘language’ of biology; we’ve invented terminology and hypotheses about how it works,” Frey said. “Much of what’s happening in biology isn’t directly related to health and medicine.”
Unravelling such complexity requires significant computing resources. “We use GPU servers, and we have relationships with Google and Nvidia to get access to the hardware we need,” Frey reveals. “It’s all about GPUs and TPUs for this kind of work. It requires a huge investment of money to train these kinds of models. They’re not cheap to build.”
‘Totally different from what we saw a few years ago’
While bigger isn’t always better in machine learning, increasingly large models are pointing the way forward as more biotech companies embrace systems with hundreds of billions or even trillions of parameters, yielding surprising results. The ability of such large models to develop unexpected capabilities — while avoiding overfitting where a model excels on training data but fails to generalize well — defies traditional expectations and challenges long-held assumptions about model behavior.
“In the last couple of years, everything’s changed,” Frey said. “It turns out you can over-train the model, and yet it still generalizes really, really well,” he added, describing the phenomenon as “shocking.”
The rise of large foundation models challenges traditional assumptions in machine learning and statistics. These assumptions suggested that highly parameterized models would struggle to balance accurately capturing underlying patterns (low bias) with avoiding oversensitivity to random fluctuations (low variance).
To illustrate this concept, imagine a game of darts where machine learning models are analogous to players trying to hit the bullseye. A model with high bias would consistently miss in the same direction – like always throwing too far to the left. A model with high variance would have its darts scattered all over the board — sometimes close, sometimes far off. The goal was to find a sweet spot: throws that cluster tightly around the bullseye.
Now, picture a dart throw that initially veers off target before curving back to the bullseye, reminiscent of David Beckham’s arcing free kicks. Giant deep learning models seem to “bend” the rules of traditional machine learning in a similar way. They appear to be over-trained and initially head off-target, but somehow still manage to “hook” back in and generalize well to new data after more training.
“We haven’t fully figured it out, but this pertains to foundation models,” Frey said. “The key is you have to have a really big model for this to happen. If you have a small model, like just training linear regression, you’ll never need to worry about overfitting.”
Some pundits attribute this phenomenon to emergent intelligence. “It’s actually figuring out relationships that are truly an order or two of magnitude beyond what you would get using a traditional way of thinking about statistics and machine learning,” Frey said, while underscoring the mystery involved in the phenomenon, which is known as double descent.
“Whatever it is that’s happening, we are seeing a qualitative change in how these machine learning methods work,” Frey said. “It’s totally different from what we saw a few years ago.”
Filed Under: machine learning and AI