Distributed data. Inadequate foundational structures. Lack of provenance. All these scenarios add a layer of inefficiency to data management that can undermine business operations and sacrifice data integrity. For global organizations, these scenarios can be devastating.
In 2016, academics developed the FAIR principles for scientific data management and stewardship to improve the findability, accessibility, interoperability and reusability of digital assets. The principles were originally designed to harness proprietary data and improve its shareability. Pharma companies today are implementing FAIR principles to reduce R&D timelines and costs, improve operational efficiency, and accelerate time to market. But rethinking data management takes an organization-wide commitment to change.
Drug developers and sponsors considering revamping data management processes must have a solution that integrates seamlessly into existing research pipelines. More than anything, researchers and scientists using ML models need real-time access to large volumes of high-quality data. Integrated tools for collaboration, single sources of information and the capability to facilitate multi-site, global partnerships are also vital.
When an organization’s current solution cannot achieve this level of integration, transitioning to a more robust one becomes a question of ‘when’ not ‘if.’
Understanding AI challenges in data management
Despite the enormous potential for AI and ML in drug development, the industry must approach this promise with a discerning eye. Some inherent limitations can impact the effectiveness of these technologies, undermine business operations and jeopardize development timelines. Here is a shortlist of the most common limitations:
Over-enthusiasm can supplant accuracy. AI and ML rightfully garner excitement, but relying on them too heavily without careful data curation is dangerous.
Inability to verify data accuracy or model it properly. For the foreseeable future, AI-generated solutions will only be as good as the input they receive from their human counterparts. Models built on poorly curated or labeled data can generate garbage datasets, wasting resources and resulting in erroneous outcomes.
Ensuring data diversity is also critical for a model to be effective. AI platforms must be trained on diverse datasets representing the organization’s various target populations. Otherwise, these platforms can adopt inherent biases, misread adjacent attributes and provide skewed results.
Data governance and privacy will be a concern for the foreseeable future. AI-based platforms incorporating cloud-based encryption, institutional compliance and provenance tracking, data anonymization and access limitations can play considerable roles in alleviating privacy and security concerns.
Choosing the right data infrastructure is not just a technological decision; it’s a business imperative. A robust and integrated data management platform helps organizations harness the true potential of AI and ML and safeguards against the pitfalls listed above.
The human factor
The human factor is an often-overlooked challenge faced during a data management revamp or transition. Change, in any form, can be difficult to embrace. But misalignments and resistance to adoption can become significant impediments to the progress and agility these solutions can deliver. These challenges are especially far-reaching when working with patient data that requires special handling and multidisciplinary teams to successfully accomplish organizational goals.
The crux of the human factor boils down to alignment in decision-making. Technical obstacles can often be addressed with the right solutions, but aligning varied human perspectives and values can be a significant hurdle. Protecting against these inherent human challenges calls for a multifaceted approach:
- Leadership commitment: A successful revamp of your data solution or transition to a new one requires leaders who believe strongly in the shift and are willing to invest time, anticipate surprises and show persistence when facing initial resistance.
- Understanding resistance: Often, opposition comes from a place of discomfort. Employees might be wary of the learning curves associated with new tools or miss the familiarity of old systems.
- Inclusive vision: As companies scale and transition from smaller homegrown solutions to enterprise-level systems, their vision should resonate with those at the grassroots level and those with a bird’s-eye view of the company’s trajectory.
- Feedback and collaboration: A one-size-fits-all approach rarely works in complex organizational shifts. Organizations must embrace tools that encourage feedback, promote iterative improvement and foster collaboration.
- Shared investment in solutions: Successful transition is a shared journey. The solution provider and the organization should be invested in understanding each other’s needs and working collaboratively.
Moving an organization toward a new data management solution is an intricate dance of leadership, understanding and collaboration. It’s not just about adopting a tool but about forming a partnership where everyone evolves and grows together.
The importance of scalability in your data management solution
There is no question that the demands on data management in the pharmaceutical industry are vast and evolving. Ensuring data consistency and utility becomes paramount as businesses scale and diversify.
R&D requires a consistent approach that addresses data needs, provides tools for collaboration and creates a single source of information. A FAIR-friendly data framework does that. It allows data to be evaluated, labeled and contextualized into a reservoir of reliable information. It also alleviates the problem some organizations face when changing rules on data standardization year-over-year or between business units, which hampers data utility and creates inefficiencies.
Establishing a consistent data framework is like creating a research pipeline tailored to an organization’s specific needs. These efforts ensure that data quality remains constant, eliminating the need to start over or redefine protocols with every new project.
The emphasis on quality and consistency in data management is about the ability to achieve reproducibility. Organizations analyze past data to discern patterns and identify trends. This foundational data then becomes the bedrock upon which predictive models are built. When a new dataset emerges, these pre-existing models can be applied, offering insights into probable outcomes based on historical precedent.
And achieving reproducibility isn’t just about replicating results. It’s about preserving the entire journey that led to those results—the minutiae of methods, data curation and analysis. This complete record allows for deeper explorations. A researcher could conceivably follow the original techniques to achieve the same result but then derive a different outcome by altering a specific part of the process. Such deviations can unearth new insights but also require understanding discrepancies and determining which result holds greater value. This process is far more manageable when all foundational information is housed in a central platform, safeguarding continuity and fostering innovation.
A final word on data management
The data management landscape is changing. No doubt. The days of ad-hoc, piecemeal data management solutions are dwindling fast. More and more organizations realize that managing and creating infrastructure for large-scale datasets simply isn’t their core strength.
End-to-end managed platform solutions are the future, but organizations should not embark on that journey alone. Especially when experienced partners—i.e., those whose primary business is data management, collaboration and AI development—are available and ready to help. It is truly counterproductive for drug developers and sponsors to squander valuable time and resources attempting to match that level of expertise. Those who choose to ignore the data management realities in front of them and go it alone will likely be left behind.
Yvete Toivola is a field application scientist at Flywheel, a medical imaging AI development platform.
Filed Under: Data science, machine learning and AI