Introduction
The study of genetics dates to the mid-19th century, from the works of Gregor Mendel, but it wasn’t until the completion of the Human Genome Project and other significant technological advances in the second half of the 20th century that great strides were made in the field of genetics. In light of these advances, more than 4,000 genes associated with genetic diseases have been discovered and more than 75,000 genetic tests have become available to the public.
Advances in technology such as next-generation sequencing (NGS) have allowed scientists to perform experiments at a rate that was never possible before. NGS is a DNA sequencing technology that allows the whole genome of an individual to be sequenced within one day, producing a large amount of clinical health data. Unfortunately, maintaining small sets of data in a centralized location for analysis (Genomics 1.0) is not keeping up with the demand for the size and complexity of today’s data sets, which run up to terabytes of data storage.
Enter Genomics 2.0. In the new era of biomedical data accessibility, Genomics 2.0 uses a technology-driven, federated data approach that allows researchers to access, explore collaborate and analyze distributed datasets without movement.
Precision medicine was born out of the realization that individual genomic data was key to identifying special treatments and therapies best suited for one’s unique genetic constitution. As the scale of precision medicine has expanded for research, however, so has the volume of clinical datasets.
The Problem?
It is estimated that by 2025, more than 500 million human genomes will be sequenced in a clinical environment.
Traditional data-sharing methods involve downloading large amounts of data onto one’s computer to analyze the clinical data. While this sounds like a simple approach, once data leaves the organization, there is no control over what is done with the datasets, and this can compromise patient confidentiality and security.
Collaboration becomes limited with traditional methods due to strict regulatory and data privacy rules that vary from country to country. As these regulations preclude data from leaving the countries where it was gathered this results in data being siloed and unusable. By some estimates, 80-90% of essential datasets are unavailable to the research community because of these restrictions.
With the influx of NGS techniques being performed — and the exponential growth in size and complexity of genomic data —current technology for health data management is no longer enough.
Traditional methods of health data management cannot keep up with the demand. As the shift in technology begins to manifest in the life science industry, a new approach to health data management is being applied: trusted research environments (TREs).
The safe and secure solution
TREs are becoming the architectural structure for health data within the research field, especially Genomics. A TRE is a centralized computing database that securely holds clinical health data without risking patient data confidentiality by never letting the data leave the organization where it is stored.
Researchers have to be appropriately trained and approved to have the credentials to access the clinical datasets within the appropriate TRE. By doing so, this path limits the possibility of patients being re-identified or unauthorized users gaining access.
While user accessibility is an essential factor in TREs, so is the quality and type of data used. Before researchers and scientists access data, the clinical data sets are cleaned, transformed to a common format (or Common Data Model, e.g. OMOP) and verified. TREs have built-in auditing to ensure compliance, and verification that the information used positively benefits public health. Researchers can bring in their own tools to analyze the findings, making the platform user-friendly.
TREs ensure safe settings by having barriers (or “airlocks”) so that activity and transactions are tracked from both sides, so that everything is secure and approved.
Global Impact
One step in the right direction for life-changing discoveries can come from one human mind, but what if you were able to incorporate ten, fifty, or even one hundred great minds? Global collaboration among scientists and researchers would create a seismic shift within the pharma and biotech industry for the better good and TREs are enabling this global collaboration.
Breaking down the barriers of siloed data allows scientists to analyze and review findings of colleagues from other organizations across the globe. Increased access to global clinical data can reduce time spent in the lab, increase the speed of diagnosis, and increase the development of new hypotheses in the research community.
Conclusion
As genomic health data continues to increase in diversity, scale and complexity, there are many challenges when it comes to storage, management, analysis, and collaboration. TREs help tackle these challenges by providing organizations and the research communities a platform that is safe, secure, and collaborative — enabling them to make a seismic shift in research and development in precision medicine and bringing innovation to patient health by assisting in the 500 million genome goal.
Dr. Pablo Prieto Barja is the co-founder and CTO of Lifebit.
Filed Under: Drug Discovery, Drug Discovery and Development