
[Adobe Stock]
“If I ask ChatGPT to plan a vacation for me and it spits out something, I might sit there and second guess it. ‘Is that restaurant really in that town?,'” said Dr. Mitesh Rao, CEO of OMNY Health, a company founded in 2017 to synthesize and contextualize healthcare data. In this high-stakes world, a “hallucination could mean someone gets hurt,” Rao said. Or worse. “If you can’t trace [AI output] back to the fundamental truth, as a provider or a clinician, you have to question that.”
OMNY Health works with life science clients including those in pharma and medical devices who have their own version of healthcare data headaches. “I basically started OMNY Health out of frustration,” Rao said. While working in senior roles related to patient safety and health services at prominent hospitals, Rao saw firsthand how data silos hindered research and potentially impacted the quality of care. “Pharma, med device and analytics companies kept coming to us wanting data, and every time we were doing it, it was piecemeal — like reinventing the wheel each time,” Rao said. While involved in a spectrum of research collaborations, each felt “bespoke.” “My goal was, how do we build relationships at scale? To do that, you really need infrastructure. The EMRs weren’t going to provide that,” Rao said. “You need something that’s not tied to the underlying IT infrastructure, something that can actually connect and serve as those pipes.”
The multifaceted challenges and impacts of healthcare data quality

Mitesh Rao, MD
The problems with data quality in healthcare are multifaceted. Healthcare data is often stuck in a patchwork of disparate systems spanning everything from electronic medical records (EMRs) and lab systems to pharmacy databases and insurance claims. Add to that data from genomics, proteomics, clinical trials, research papers and beyond, the complexity ramps up, with each data type having its own vernacular, making it difficult to get even a unified view of a single patient’s health journey. “The data is kind of everywhere,” Rao put it. “There’s so much heterogeneity. It takes a lot to actually take that clinical data and transform it into usable, research-grade, regulatory-grade data and evidence,” he added.
In machine learning, the merging and concatenating disparate datasets can shed light on how disparate variables interrelate. But in healthcare, the array of data formats, terminologies and coding practices can be a stumbling block. Even data within a single medical institution could be siloed with data living within separate systems, sometimes requiring doctors to switch computers to access complete patient information.
Given the chaotic and sometimes hurried nature of clinical care, data quality is not always consistent, with missing values, errors and inconsistencies sometimes complicating analysis. “If you’ve seen the data coming out of one EMR within one hospital system, you know it’s often pretty messy and not very useful,” Rao said.
On top of that, healthcare data is inherently sensitive, and protecting patient privacy is a moral and legal priority. Companies working with this data must adhere to regulations such as HIPAA in the U.S. and GDPR in Europe. “There has to be a very strong moral and ethical compass in how the data is handled,” Rao said.
While data volume, along with ever-more powerful compute and algorithms, has helped fuel the current AI boom, simply having more data isn’t always better. Raw information, no matter how voluminous, is of limited value without careful curation and contextualization. As Rao put it, “People will start to realize that not all data is equal, not all data has come from the same source — quality, depth, timeliness, comprehensiveness, those are the important pieces.”
Implications for biopharma and the high stakes in healthcare
For biopharma companies, these data challenges have direct implications for drug development and clinical trials. Identifying promising drug targets, recruiting the right patients, and demonstrating efficacy all rely on accurate, comprehensive data. Fragmented or unreliable information can lead to problems with everything from target selection to clinical trial design.
For example, in recent decades, several major pharmaceutical companies have faced setbacks or warnings from regulators as a result of data snags, including data documentation problems and concerns about the accuracy and completeness of clinical trial data.
The increasing use of AI and machine learning in drug discovery and clinical trials further raises the stakes for data quality. The risk of AI hallucinations or faulty outputs from poor data inputs could lead biopharma companies down costly dead ends or even put patient safety at risk.
In healthcare, where lives are at stake, the consequences of poor data can be particularly severe. “That’s the thing when healthcare uses large language models — you need to provide data and you need provenance,” Rao said. “You need to be able to know that this output ties to a specific episode of care.”
OMNY Health’s growth and alignment with FDA priorities
OMNY Health was founded on the principle that better data can lead to better healthcare outcomes. Fast forward to today, and OMNY has patient records relating to more than 78 million patients across the U.S. The company hs forged partnerships with companies such as Atropos Health and Datavant, focusing on real-world data for applications spanning clinical trial research and drug development. The company is on track to surpass 100 million patients by the end of the year.
OMNY’s focus on data provenance and traceability aligns with the FDA’s increasing demands in this area. “Look at the way the FDA is going now around data — they want to see provenance,” Rao said. “They want to know the source of the data. They want proof.”
The explosive popularity of generative AI, along with the possibilities of AI hallucinations and the misuse of AI for misinformation, is set to “raise the bar” for data quality. Rao noted that “people will start to realize that not all data is equal.”
Ultimately, this push for higher data standards comes back to the core values of medicine itself. “It’s funny — physicians at our core, we’re data-driven. It’s how we’re trained,” Rao said. “Data often speaks to us in a way that is convincing. From early in our training, we’re taught that the literature, publications, research-grade output that has been vetted and peer-reviewed — that’s sort of the Bible that we can follow.” In an era where algorithms increasingly inform medical decisions, that same commitment to relying on trusted, verifiable data is no longer aspirational — it’s essential.
Filed Under: clinical trials, Data science, Drug Discovery, machine learning and AI