
[Natasha/Adobe Stock]
“We want to understand fundamentally the biology of every human disease so that we can develop new treatments for every patient and faster,” said Iya Khalil, Ph.D., VP of Data, AI, and Genome Sciences at Merck & Co. Khalil is helping build a data science team at Merck that “does a lot of what machine learning teams at Big Tech companies do, but for biology,” she said. This includes initiatives like the Merck Digital Sciences Studio, which supports biomedical startups, while internal teams focus on AI-powered drug development and patient engagement.
Connecting multimodal dots (and decades of patient data)

Iya Khalil, Ph.D.
Merck partnered with Culmination Bio to access more than 40 years of medical data from Intermountain Health, including more than 5 million biospecimens tracking millions of patients. Culmination adds more than 300,000 biospecimens annually. “To have clinical, genomic, and biospecimen data, all linked in a continuous fashion over decades, and from a single source, is unusual,” said Lincoln Nadauld, M.D., Ph.D., founder and CEO of Culmination Bio.
Both organizations aim to deepen our understanding of disease biology by identifying the root causes of illnesses and developing personalized therapies. Culmination Bio specializes in converting raw data into a usable format. By cleaning, de-identifying, and organizing information from Intermountain Health, Culmination helps researchers identify patient groups with specific traits, essential for addressing Merck’s research questions.
Beyond just genes to a holistic view of patients

Dr. Lincoln Nadauld
Culmination Bio combines clinical records, genomic data, and biospecimen information to create a comprehensive view of each patient. While dealing with genetics and omics data can pose challenges, the difficulty of working with clinical data can sometimes prove more complex. “People talk a lot about genetics, omics, multimodal data [as being challenging to work with], but then you have the clinical side. How do you leverage clinically those lab values, those clinical notes?” Khalil asked. “We now have standard pipelines for sequencing data. You can pipeline that, you can even turn it into a machine essentially.”
Merck’s expanding network of data partnerships
The pact with Culmination is the latest in a string of partnerships. In 2023, Merck entered into an agreement with Natera to use its real-world database in oncology research. Merck & Co. has also teamed up with Illumina, Nashville Biosciences, and fellow Big Pharma firms such as AbbVie, Amgen, AstraZeneca, and Bayer in the Alliance for Genomic Discovery (AGD). The multiyear agreement aims to accelerate the development of therapeutics through large-scale genomics and the establishment of a preeminent clinical genomic resource. As part of the alliance, the founding members will co-fund the whole-genome sequencing of 250,000 samples from Vanderbilt University Medical Center’s biobank and have access to the resulting data for use in drug discovery and therapeutic development.
Year | Select data-driven initiatives at Merck |
---|---|
2017 | Data-driven collaboration with Corning and Pfizer to modernize pharmaceutical glass packaging. The collaboration involved the use of data science, including advanced analytics. |
2019 | Partnership with Harvard to discover new immuno-oncology targets. The project included extensive data collection and analysis. |
2021 | Merck worked with Amazon Web Services (AWS) to develop the Change Assessment Knowledge Engine (CAKE) to help chemistry, manufacturing, and control scientists at Merck better assess change proposals. |
2022 | Launch of Merck Digital Sciences Studio (MDSS) to support digital health and biotech startups. |
Announced ML-based partnership with Saama Technologies. | |
2023 | $10M investment in Culmination Bio to develop a comprehensive data lake for pharmaceutical research. |
Merck announced that it will modernize its IT infrastructure and enhance drug discovery and clinical trial development in a partnership with AWS and Accenture that will make use of AWS services like Amazon SageMaker, AWS HealthOmics and high-performance computing (HPC) capabilities. | |
2024 | R&D collaboration with Culmination Bio focusing on autoimmune diseases using de-identified multi-modal datasets. |
Variational AI announced alliance with Merck to evaluate its Enki generative AI platform for designing novel, selective small molecules. | |
Ongoing | Workshops and data science symposiums with Corning as well as “Datathons,” which offer $25,000 in cash prizes and job opportunities at Merck. |
Internal data science teams support scientific and manufacturing divisions. The company continues to build its data science team. | |
Collaboration with the Bill and Melinda Gates Foundation and CDC on data-driven public health initiatives. |
Discovering new biology with an assist from machine learning
Machine learning helps identify genetic markers to predict treatment response. Merck is using this approach to advance immunotherapy, building on successes in oncology. “We were able to identify and learn biomarkers around who would predict response to immunotherapies,” Khalil explained.
Now, with Culmination Bio’s vast datasets, Merck aims to replicate this success across a range of disease areas, focusing first on autoimmune disease. By analyzing those who respond well to treatment – and those who don’t – the company hopes to pinpoint the underlying biological reasons and develop more precise therapies for everyone. “We get to span the spectrum of human disease,” Nadauld agreed.
The power of asking the right questions
High-quality data is powerful only when paired with the right research questions. Although Culmination has extensive datasets and strong analysis tools, “we don’t know all of the right questions to ask,” Nadauld said. “We don’t have a corner on that, and our partners at Merck have really good insights as to what questions to pose of this data,” he added. “Our goal at Culmination is to discover better health.”
Making precision medicine a reality
While vast datasets and AI are potentially powerful, they’re only as good as the questions researchers ask. To that end, Merck and Culmination Bio are fostering a collaborative environment to make sure they’re on the right track. Teams from both organizations regularly come together, bringing their unique expertise to bear on specific disease areas and patient populations.
“We’ve met in one disease state and defined a question in a cohort of patients that would work well to answer that. Then shortly thereafter, we met with a related but different team on a totally different disease state,” Nadauld explains.
A multidisciplinary data foundation to target the root causes of illness
This collaborative approach is essential for tackling complex diseases. “Just to give an example, when you look at something like atopic disease, which is a disease that affects many people in this country, we have great new treatments now but there are still a lot of patients that do not respond to therapy,” Khalil said. By analyzing cohorts of responders and non-responders at a genetic and molecular level, and tapping machine learning to understand the underlying disease biology, Merck hopes to identify potential treatments for non-responders and develop more precise therapeutic approaches.
Khalil highlights the importance of multidisciplinary collaboration in discovering new biology with clinical impact. “The key thing for us is that we have a strategy around how we’re going to discover biology, which is super-important for Merck,” Khalil said. “We not only want to treat the patients we have today but also the patients of tomorrow and the future. We need to identify new biology that will have clinical impact and impact on medicine.”
Filed Under: clinical trials, Drug Discovery, machine learning and AI