Synthetic data's role in modern oncology R&D

Dots and lines rendered in abstract 3D. background in technology. display of large data. synthetic intelligence. Plexus.

[MDSHAFIQUL/Adobe Stock]

Synthetic data in oncology is transforming how researchers and developers approach real-world evidence. They often need this evidence to test hypotheses, predict outcomes and develop algorithms. But privacy constraints and access related to patient data can create delays and lengthen project timelines.

Oncology drug researchers and developers have recently begun using synthetic data in oncology to get around the privacy constraints and access issues related to patient data that create delays and lengthen project timelines.

Conceptually, synthetic data in oncology is about taking private patient information and enabling researchers to access the data without compromising privacy, offering a significant tool for current oncology research processes.

Traditional vs. new data approaches in oncology research

Traditionally, oncology researchers and drug developers have relied on the information in patients’ electronic medical records (EMRs) to provide evidence for studies. However, there are several drawbacks to this approach. Some datasets are limited to only structured variables in the oncology EMR system, and structured data only provides a minuscule amount of the detail required. To overcome data limitations, providers employ clinical extractors who manually comb through medical records to find information based on either a proprietary data specification or a client’s requirements. The end product can then be combined with claims data to tell the story of a patient’s care journey, answer specific research questions or provide other data to fill in gaps.

In addition to being a lengthy and expensive process for researchers, this approach inherently results in knowledge gaps resulting from limitations in the scope of patient data. The protocol development process must meticulously add necessary clinical characteristics and data points to ensure the ethical use of the dataset and to protect patient privacy. Further, the use of patient data in real-world studies requires patient consent and approval from an institutional review board, which can often be a lengthy, onerous process, particularly if additional parameters and protocol amendments are required.

The advantages of synthetic data

The use of synthetic data, or a synthetic data lake, results in a data set that possesses the same statistical properties as the original data but without any of the original patients, and therefore does not compromise privacy. Despite the absence of identifying patient information, synthetic data delivers full utility of the data because researchers can explore it freely, generate hypotheses and answer research questions quickly. This allows for go/no-go decisions to be made rapidly for studies, protocol design, and patient recruitment.

Synthetic data offers the potential to mimic the characteristics of a real dataset without sensitive patient information, making it a good option for analyzing large but sensitive samples of real individual-level patient data. Synthetic data differs from de-identified data in that it is built from scratch, as opposed to being based on individual patient records, which means synthetic data cannot be de-anonymized.

Researchers can use synthetic data to improve many of the processes associated with drug development, such as building control arms for investigations, researching appropriate sites to host clinical trials, hypothesis testing, and training AI and machine learning models.

Using synthetic data to find patterns of stroke in cancer patients

The Ottawa Hospital is one of the largest academic and research hospitals in Canada, consisting of three hospital sites, 1,200 beds, and more than 12,000 employees and support staff. A group of researchers at the hospital used synthetic data to study risk factors associated with ischemic stroke in cancer patients.

About 10% of patients with ischemic stroke, a condition in which the brain’s blood supply is reduced, also have cancer. Ischemic stroke is a leading cause of disability and the second-leading cause of death worldwide, so the ability to identify treatment strategies to prevent its occurrence in cancer patients is critical.

Using synthetic data based on data populations from The Ottawa Hospital Data Warehouse of 10,875 patients from 2000 to 2019, the research team compared two patient groups: First, all patients who had both a cancer diagnosis and an ischemic stroke within a 2-year period after their cancer diagnosis, and second, all ischemic stroke patients without cancer.

The broader implications of stroke and cancer research

At the study’s conclusion, researchers found that found cancer patients with ischemic stroke have a higher prevalence of chronic obstructive pulmonary disorder, previous ischemic stroke, and venous thromboembolism, and that previous ischemic stroke is an important predictor of recurrent stroke in cancer patients. The results highlight the importance of identifying optimal secondary prevention treatments in cancer patients, according to the researchers.

To achieve goals such as the “cancer moonshot” of preventing more than four million cancer deaths by 2047 will require new approaches to oncology therapeutics research and development. Synthetic data in oncology, owing to its ability to provide deep clinical insights without compromising patient privacy, is poised to play a substantial role in many future discoveries.

Eric McDavid, life sciences business development lead at MDClone is a life science professional with more than 20 years of achievement in developing, growing, diversifying and expanding business.

Filed Under: clinical trials, Data science, Drug Discovery, machine learning and AI, Oncology
Tagged With: clinical trials, Data Anonymization, drug development, oncology research, patient privacy, real-world evidence, synthetic data

Traditional vs. new data approaches in oncology research

The advantages of synthetic data

Using synthetic data to find patterns of stroke in cancer patients

The broader implications of stroke and cancer research

Related Articles Read More >

Navigating the intersection of technology and human expertise in life sciences

The $5-7B generative AI opportunity biopharma can’t afford to ignore

Demystifying deep learning: An accessible introduction to neural networks in health research and epidemiology

Global biotech VC trends in Q1 2024

Search Drug Discovery & Development