Data is good. The right data is better. But finding the right data is no easy task. Inaccessible data sources, the growing complexity of search terms required to attain appropriate results, and the multitude of databases available means that finding data – and then applying it to inform research – is taking up more and more valuable researcher hours. As a result, 80% of researcher time is dedicated to acquiring and reformatting data; time that could be much better spent on analysis and developing scientific insights.
It is, however, essential. What’s needed are methods to accelerate the search and retrieval of large volumes of data because the insights contained within can lead to critical breakthroughs. By standing on the shoulders of the researchers who came before us, we can see the previously unidentifiable patterns, saving time and costs and propelling us towards new discoveries. The pace of scientific advancement has never been more apparent than in the past two years, with vaccines and therapies for SARS-CoV-2 developed at an unprecedented rate – and often by applying the learnings from earlier research.
Similarities in science
All living organisms are composed of the same basic building blocks. With a set – albeit staggering – number of ways these building blocks can interact, there are only so many combinations that work. Combine this with the nature of evolution, and it is not surprising that at a cellular level, a number of biological processes are highly similar in different species. For years, researchers have conducted an in-depth analysis of the genomes, structures, and biology of thousands of species of animal, bacteria, fungus, and virus. Those scientists have produced exhaustive datasets on even the rarest of organisms. But how can the scientific community exploit this knowledge and use it to become more efficient?
Researchers need to extract and harmonize only the relevant data, allowing for efficient analysis and easier development of predictive models. To achieve this, tools must be designed by those who understand scientific literature, the intricacies of taxonomies, and specialist users’ requirements. Ultimately, tools must be useable, accessible and beneficial to the scientific community.
Similarities and SARS-CoV-2
Life science researchers are already harnessing existing data’s potential to quickly answer modern problems. For example, in March 2020, at the beginning of the global pandemic, teams leveraged the genetic and phenotypic similarities between SARS-CoV-2 and other previously studied viruses.
One similarity identified early on was in 3C protease, an enzyme known to exist in coronaviruses and drive an essential stage of their life cycle. Coronavirus and enterovirus proteases often share similarities in both structure and function, meaning drugs that target the active site of these enzymes have the potential to target both virus groups. For example, one essential protease produced by SARS-CoV-2 has a markedly similar structure to enterovirus 3C protease – an enzyme that has been well-studied in the past. With the spread of COVID-19 escalating, and mounting pressures to identify treatments, this finding was significant.
Following this discovery, what was already known about enterovirus proteases could be leveraged and directly applied to fast-track treatments targeting SARS-CoV-2. With substantial bioinformatics and structural data on these proteases and the compounds that interact with them already available, a list of compounds that interact with enterovirus 3C protease could be produced quickly using existing data sources, such as the Reaxys Medicinal Chemistry database. With substance-target affinity data, pharmacokinetic, efficacy and safety profiles, and bioactivity data all held and normalized in a single repository, appropriate substances can be identified and extracted quickly.
Exploiting existing data to identify substances with pharmaceutical potential jump-started research into potential treatments for COVID-19. This accelerated the pre-clinical research stage and empowered us to efficiently isolate compounds suitable for further investigation. The findings from this research have since been reinforced, with Pfizer publishing results of its clinical trials into potential Covid-19 treatments in late 2021. Successful therapies have similar structures to those extracted from the database in 2020.
So much pharmaceutical research data, so little time
When setting out on a pharmaceutical research project, the first step is always a literature search: what has been done, who has done it, what have they found, and what is still unknown? These reviews will only continue to become more arduous as the data pool of scientific information grows. And it is not only data published in academic journals that must be considered – experiments that were abandoned yielded no results or were not regarded as novel enough to publish should still be used to inform future scientific endeavors.
Data should directly inform our hypotheses, research methods, and experiments – empowering us to answer research questions more efficiently. But, having so much data at our fingertips comes with the additional challenge of finding and extracting those which are relevant to our research question. Leveraging what is already known is essential in streamlining the scientific process, helping researchers to find the right answer faster. Tools able to sift through data quickly and extract relevant information at speed help save time on futile detours and allow scientific research to become more efficient and productive, even in critical situations.
Dr. Paul Dockerty has a pharmacist degree from the University of Rouen, France, a M.S degree in chemistry from the University Paris Descartes, France and a Ph.D in Chemical Biology from the University of Groningen, The Netherlands where he focused on the development of chemical probes based on a enol-carbamate scaffold. He works now as a customer engagement manager in the professional services group at Elsevier and is responsible for supporting pharmaceutical customers in their digitalization journey.
Filed Under: Data science